Mastering Client-Side File Downloads with the Streams API

Typically, servers handle the preparation of data for download. But what if you create data on the client side or combine data from multiple endpoints, which results in GB's big file? Standard method using Blob may not be suitable.

In this article, we'll explore how to download big files with client side JavaScript & Streams API.

After reading, you'll be able to explain:

  • What is a Blob and what is its shortcoming when dealing with file download.

  • How to create a ReadableStream from scratch

  • How to transfer stream ownership to a Service Worker.

  • How to simulate server file download with Service Worker FetchEvent.

💡
This article is inspired by StreamSaver.js https://github.com/jimmywarting/StreamSaver.js and explains how this library works and how can you use the same concept to built just what you need. Also, check it out if you need a zip library compatible with Streams API

Blob file download in the browser

On the client side, file download can be done with an a tag with a download attribute:

<a href="example.com/download" download="archive.zip">Download</a>

The attribute download will override Content-Disposition setting when href points to a server location. But it doesn't need to point to a server location, we can use Blob URL instead.

Blob (Binary Large Object) is a file like container for data.

This is useful when you are creating content on the client side (rendering on canvas would be one use case) and then you need to download it or display as an image or video.

blob = new Blob(['fox', 'jumps', 'over', 'the', 'lazy', 'dog']); 
blobURL = URL.createObjectURL(blob);
// "blob:https://example.com/63fd8457-904a-4c5d-985f-f77c9ea6a58b"
💡
A Blob URL is a reference to a memory block that won't be released until the page is unloaded. When you don't need a blob anymore, release it with URL.revokeObjectURL. See: MDN -> createObjectURL -> Memory management

Then you trigger download by creating a Element in JS and clicking it programmatically:

anchor = document.createElement('a');
anchor.download = 'archive.zip';
anchor.href = blobURL;

// trigger download
a.click()

Streaming files

Every time you create a Blob URL, memory is allocated. Think about videos or archives that can be GB's big. Putting such big files into application memory can quickly result in out of memory problems.

For this reason, servers usually stream data from storage. This way, only chunks of data needs to be loaded into memory at a given time. On a client side, you can do the same using Streams API.

Introduction to ReadableStream

Even if you never used Streams API before, you must have used Fetch API.

Consider this example:

response = await fetch('https://example.com/resource');
reponse.body
//< ReadableStream

Turns out the response.body is a ReadableStream. So instead of awaiting the entire content, using response.blob() or response.json(), we can read data as it's being downloaded:

response = await fetch('https://example.com/resource');
stream = response.body;

// Stream implements async iterator:
// https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#async_iteration
for await (const chunk of stream) {
    console.log('chunk:', chunk);
}

// Alternatively use Reader:
reader = stream.getReader();
reader.read(function consumeChunk({ done, value }) {
    // Return when done
    if (done) {
        return;
    }
    // Read next chunk
    reader.read(consumeChunk)
});
💡
If you ever tried to consume Response using json() and blob() at the same time, you know it's prohibited. You can consume Response only once... unless you use response.body.tee() which will produce two identical copies of the response data, which you can consume separately.

Let's construct a ReadableSource from scratch.

Before we do that, we need an underlying source. This is where we get our data from. For the purpose of this exercise, let's create an async iterator which will yield an incremented integer <0, 9> after waiting 1 second. In real life, this is where you would produce data, or call some API to fetch it.

async function* getRandomZeroToTenIntsIterator() {
    for (let i = 0; i < 10; i++) {
        await new Promise(resolve => {
            window.setTimeout(() => {
                resolve();
            }, 1000);
        });
        yield i;
    }
}

iterator = getRandomZeroToTenIntsIterator();

Whenever we call iterator.next() we get an object :{ value: number, done: boolean }.

Now we can feed that data into ReadableStream:

iterator = getRandomZeroToTenIntsIterator();

underlyingSource = {
    // Pull will be called again after promise returned by it resolves
    async pull(controller) {
      const { value, done } = await iterator.next();
      if (done) {
        controller.close();
        return;
      }
      controller.enqueue(value);
    }
};


stream = new ReadableStream(streamSource);

Now that we have a stream that we can feed and read from, we need to allow downloading of the stream. But there's no way to create an Object URL from a stream.

Intercepting download request in a service worker

Once we have a stream, we can simulate what the server does. We'll do it in 3 steps:

  1. Registering a Service Worker

  2. Transferring stream ownership from main thread to the Service Worker.

  3. Intercepting FetchEvent inside the Service Worker and responding with data from our stream, using event.respondWith.

  4. Clicking fake download link to start download.

First, we'll register a Service Worker and transfer the stream ownership.

To pass a stream object to a Service Worker, use postMessage and the fact that ReadableStream is a Transferable Object. This is great because transferring such objects is a zero-copy operation. We literally transfer ownership from the main thread to a worker thread, meaning after the transfer, we cannot access the stream object from the source thread anymore.

navigator.serviceWorker.register("service-worker.js");

navigator.serviceWorker.ready.then((registration) => {
  // Second argument of postMessage is an Array of Tranferables.
  registration.active.postMessage(stream, [stream]);
});

Now let's write worker code which will save the stream and intercept fetch events.

let stream = null;

// Listen for a message containing stream
self.addEventListner('message', (event) => {
  if (message.stream instanceof ReadableStream) {
     stream = message.stream;
  }
});

self.addEventListener('fetch', (event) => {
  // Ignore if url does not match or stream is not set
  if (event.request.url !== '/fake-download' || !stream) {
    return;
  }

 // Create Response with a stream
  const response = new Response(stream, {
    headers: {
      // This header hints browser to trigger file download.
      // Normally server would sets it.
      'Content-Disposition': 'attachment; filename="file.txt"'
    }
  );

  // Respond...
  event.respondWith(response);
});

At this point, we have everything ready to actually call download endpoint we just "created":

anchor = document.createElement('a');
// We are opening link in a new tab to avoid navigating away from the current page.
// but new tab won't acutally open when browser realizes it should download instead of display 
a.target="_blank"
anchor.href = '/fake-download'

// Open the link
a.click()
0
Subscribe to my newsletter

Read articles from Zbigniew Matysek directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Zbigniew Matysek
Zbigniew Matysek