Large File Uploads: Choosing Between Streaming and Buffering

BillyBilly
4 min read

Handling large file uploads is a common challenge when building systems that deal with media, backups, or user-generated content. Once your files start exceeding your machine’s RAM (or even disk size, in some cases), naive upload methods break down fast. This article explores the different approaches you can use to upload large files, especially when the file size exceeds RAM and how each strategy works across VM-based and serverless environments.

We’ll walk through three common techniques:

  1. In-memory buffering

  2. Temporary disk buffering

  3. Streaming uploads

1. In-Memory Buffering

How it works:
You load the entire file into memory (e.g., a Buffer or byte array) before uploading it to the provider.

Pros

  • Simple and fast for small-to-medium files

  • Code is easier to write/debug

Cons

  • Fails or crashes if file > Memory

  • Not suitable for serverless where memory is capped (e.g., 512MB)

Use when

  • Files are small or predictable in size

  • You control the infrastructure and can scale Memory

  • You need to inspect the whole file in Memory

Code Example

func uploadInMemory(ctx context.Context, file multipart.File, fileSize int64, bucketName, objectName string) error {
    buffer := make([]byte, fileSize)
    _, err := file.Read(buffer)
    if err != nil {
        return err
    }

    client, err := storage.NewClient(ctx)
    if err != nil {
        return err
    }
    defer client.Close()

    wc := client.Bucket(bucketName).Object(objectName).NewWriter(ctx)
    if _, err := io.Copy(wc, bytes.NewReader(buffer)); err != nil {
        return err
    }
    return wc.Close()
}

2. Disk-Based Temporary Buffering

How it works:
You store the file on disk temporarily (e.g., /tmp) before uploading it in full.

Pros

  • Works with files larger than RAM

  • More stable than in-memory on constrained environments

Cons

  • Needs disk space, ephemeral on serverless (e.g., Google Cloud Function 2nd gen /tmp max 2GB)

  • Slower than memory due to I/O

Use when

  • Works well in VMs or containers with disk access

  • File is too large for memory but can fit in local disk

Code Example

func uploadFromDisk(ctx context.Context, file multipart.File, bucketName, objectName string) error {
    tempFile, err := os.CreateTemp("", "upload-*.tmp")
    if err != nil {
        return err
    }
    defer os.Remove(tempFile.Name())
    defer tempFile.Close()

    _, err = io.Copy(tempFile, file)
    if err != nil {
        return err
    }

    f, err := os.Open(tempFile.Name())
    if err != nil {
        return err
    }
    defer f.Close()

    client, err := storage.NewClient(ctx)
    if err != nil {
        return err
    }
    defer client.Close()

    wc := client.Bucket(bucketName).Object(objectName).NewWriter(ctx)
    if _, err := io.Copy(wc, f); err != nil {
        return err
    }
    return wc.Close()
}

3. Streaming Upload

How it works:
Stream the file directly from the input (request or file source) to the storage provider without buffering the whole file.

Pros:

  • Handles truly massive files

  • Minimal RAM usage

  • Ideal for serverless and modern cloud-native apps

Cons:

  • Slightly more complex implementation (you manage streams, backpressure, etc.)

  • Limited control over retries if the connection drops mid-upload

Use when:

  • File size is unpredictable

  • You care about performance, cost, and stability

  • You want maximum scalability

Code Example

func uploadStreamed(ctx context.Context, file multipart.File, bucketName, objectName string) error {
    client, err := storage.NewClient(ctx)
    if err != nil {
        return err
    }
    defer client.Close()

    wc := client.Bucket(bucketName).Object(objectName).NewWriter(ctx)
    if _, err := io.Copy(wc, file); err != nil {
        return err
    }
    return wc.Close()
}

Conclusion

Handling large file uploads effectively requires different strategies depending on the environment and constraints. In-memory buffering is simple but limited by RAM size, making it unsuitable for serverless environments. Disk-based buffering allows handling of larger files than RAM can accommodate but requires disk access and is slower due to I/O operations. Streaming uploads offer the best solution for handling large, unpredictable file sizes with minimal RAM use, suited for serverless and cloud-native applications. Each approach has its pros, cons, and suitable use cases.

When handling large file uploads, avoid buffering into memory unless absolutely necessary. For modern apps, streaming uploads are the most scalable and efficient method. Especially when working within the constraints of serverless environments.

If you want resumability and reliability, Google Cloud Storage's native resumable uploads are automatically used when you write via Writer. For reducing backend load altogether, pre-signed URLs are a great client-side strategy.

Let the infrastructure work with you, not against your RAM.

0
Subscribe to my newsletter

Read articles from Billy directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Billy
Billy