Node.js Writable Streams: A Practical Guide

In Node.js, there are 4 primary types of streams: readable, writable, transform, and duplex. In the previous article, we looked at the readable streams in detail.

Perhaps you've heard something about writable streams or even used them. But there is always a fine line between "I kind of understand how they work" and "I understand how they work".

After reading this article, you should be confident enough to use writable streams confidently.

What are Writable Streams?

As the name suggests, writable streams are meant to write some data to the destination. There is a lot to unpack in this statement.

First, what data do we pass to the writable stream and, where does it come from? It can be anything that JavaScript has access to. For example, if you create a string in a JavaScript application and save it as a variable, you can access it directly.

const name = 'John';

// We can get the name value right away
console.log(name);

However, the data that you want to work with is not always accessible right away. You must make the data accessible for JavaScript first, and only then can you use it.

import { createReadStream } from 'node:fs';

const stream = createReadStream('file-name.txt');

stream.on('data', (chunk) => {

    // Now we have access to the data from the file
});

In this example, the data we want to work with is inside the file-name.txt file. This file resides on the file system and, JavaScript doesn't have direct access to it. That's why we first need to make the file data available in JavaScript using the createReadStream function.

If you're not comfortable with the readable streams yet, check out the previous article to better understand the readable streams.

Okay, the part about data we write into the stream should be clear now. The next question is, what is the destination, and what does it mean that stream writes data into the destination?

The destination can be anything:

Database
Network socket
File
S3 cloud storage
Standard output
Other streams
JavaScript structures like arrays and objects

The last point doesn't make much sense, but you get the idea. The destination can be virtually anything.

This model looks similar to readable streams in the sense that we have 3 main parts involved:

The data
The stream
The destination

The difference is that readable streams can get data from anywhere into your JavaScript application, and writable streams can write data anywhere from the data that is available in your application.

The difference between readable and writable streams

Writing data to the destination of Writable Stream

We use the write method to write data to the destination.

Here is how it works:

import { createWriteStream } from 'node:fs';

const writableStream = createWriteStream('input.txt');

writableStream.write('Hello World!');

In this example, we create a writable stream with the destination input.txt file. Whenever we try to write something inside that stream, it gets transferred into the input.txt file.

After calling the write method, we can go to the input.txt file and see that it now contains Hello World! text.

Note: When writing data to a stream, you have to be aware of the backpressure mechanism. It prevents memory overflow by controlling the rate at which data is written into the stream. You'll learn more about it in the upcoming article where we dive deep into how backpressure works in writable streams.

While it might look OK to work with the write method manually, imagine if we were to have a bit more complicated setup.

import { createReadStream, createWriteStream } from 'node:fs';
import { S3WritableStream } from 'utils/s3-writable-stream';

const s3WritableStream = new S3WritableStream('s3-destination-url');
const readableStream = createReadStream('input.txt');

readableStream.on('data', (chunk) => {
    s3WritableStream.write(chunk);
});

Doesn't look too bad so far, right? Now, let's factor in that we have to handle errors and cleanups for each stream properly.

import { createReadStream, createWriteStream } from 'node:fs';
import { S3WritableStream } from 'utils/s3-writable-stream';

const s3WritableStream = new S3WritableStream('s3-destination-url');
const readableStream = createReadStream('input.txt');

readableStream.on('data', (chunk) => {
    s3WritableStream.write(chunk);
});

s3WritableStream.on('error', () => {
    // Clean up the resources related to the stream.
    // Perhaps you want to close the other streams at this point.
});

readableStream.on('error', () => {
    // Clean up the resources related to the stream.
    // Perhaps you want to close the other streams at this point.
});

As you can see, if we start building more-or-less production-grade applications, we have to think about proper error handling and resource cleanups.

When it comes to this, working with multiple streams by calling the write method becomes way too verbose and repetitive. There is a better way to do it, and it is by building data flows using functions like pipeline and pipe.

But before diving into the data flows, I want to address the strange-looking writable stream that is used in the code examples named S3WritableStream.

Creating a custom writable stream

This strange-looking stream is a custom writable stream. It is created to handle a specific pattern of using a writable stream.

In our case, the S3WritableStream is designed to handle the write operation into the S3 bucket.

We don't need to write all of the connection and authentication logic. The custom stream handles all of it. The only thing we need to specify is a URL where we want to store the data.

The benefit of such custom streams is the same as creating a function, class, or any other reusable unit - we encapsulate the complex logic of handling S3 workflow inside of the stream and can reuse it across the project later on.

Here is what pseudo implementation of S3WritableStream might look like:

import { Writable } from 'node:stream';

class S3WritableStream extends Writable {
    #url

    constructor(url) {
        super();

        this.#url = url;
    }

    _write(chunk, encoding, callback) {
        // Logic to write data into the S3 bucket
    }

    _final(callback) {
        // Logic to finalize the stream workflow
    }

    _destroy(error, callback) {
        // Handle the destroy even by cleaning up resources
        // and the error.
    }
}

We can encapsulate a lot of low-level details in a custom stream abstraction.

While custom streams can significantly simplify a particular type of workflow, we still have to think about combining multiple streams properly to build a data flow.

Building data flows with Writable Streams

Writable streams are powerful, but they get to the next level when we start building data flows(aka pipelines) using them.

For example, we can combine readable and writable streams into a single pipeline. By doing that, data from the readable stream gets automatically transferred into a writable stream.

Here is a simple example of piping two streams.

import { createReadStream, createWriteStream } from 'node:fs';

const readableStream = createReadStream('input.txt');
const writableStream = createWriteStream('output.txt');

readableStream.pipe(writableStream);

Here, we call the pipe method of a readable stream and pass the writable stream as the destination where the data should be forwarded.

No need to listen for data events on the readable stream and call write in a callback.

We can achieve almost the same result using the pipeline function.

import { createReadStream, createWriteStream } from 'node:fs';
import { pipeline } from 'node:stream';


const readableStream = createReadStream('input.txt');
const writableStream = createWriteStream('output.txt');

pipeline(readableStream, writableStream);

What is the difference between them? Why do we need to have multiple functions that are doing the same thing? While they might look similar there is huge difference, let's get into more details.

`pipe` doesn't have a promise-base API

At this point, promises are the standard for working with asynchronous operations in Node.js.

It is way more convenient to use async/away syntax whenever possible. Thanks to it, we can read async code as a series of synchronous operations.

Because of that it is way easier to track the logic flow compared to events.

import { createReadStream, createWriteStream } from 'node:fs';

const readableStream = createReadStream('input.txt');
const writableStream = createWriteStream('output.txt');

readableStream.pipe(writableStream);

console.log('Operation finished');

In the case of using, pipe we'll see the console log output first, and only then the operation will finish.

Compare it to using the pipeline function with promises API.

import { createReadStream, createWriteStream } from 'node:fs';
import { pipeline } from 'node:stream/promises';

const readableStream = createReadStream('input.txt');
const writableStream = createWriteStream('output.txt');

await pipeline(readableStream, writableStream);

console.log('Operation finished');

Poor error handling by `pipe` method

When working with streams, you shouldn't forget about proper error handling.

Both pipe and pipeline can handle errors, but the way they do it differs significantly.

readableStream
    .pipe(transformStream)
    .pipe(writableStream)
    .on('error', (e) => {
        // Handle the error
    });

The biggest catch here is that on('error') it only catches errors from the writableStream (the last one on the chain).

If you want to handle errors properly for each of the streams, you have to add error listeners for each of the streams involved in the pipeline.

readableStream
    .on('error', (e) => {
        // Handle error
    })
    .pipe(transformStream)
    .on('error', (e) => {
        // Handle error
    })
    .pipe(writableStream)
    .on('error', (e) => {
        // Handle the error
    });

Doesn't look too nice. Now let's compare it with pipeline API.

try {
    await pipeline(
        readableStream,
        transformStream,
        writableStream
    )
} catch (error) {
    // Handle error
}

Unlike pipe the pipeline function is able to handle errors for each of the streams involved in the pipeline. How cool is that?

The `pipe` method doesn't clean up resources properly

You probably noticed that we're here to roast the pipe method.

The next problem it has is the absence of a proper mechanism to clean up resources.

It means that if we have a stream involved in a pipeline and some other stream in the pipeline errors out, the pipe method doesn't close other streams automatically leaving them and their resources handing in the memory.

// Using pipe() - resources aren't properly cleaned up
const transform = new Transform({
  transform(chunk, encoding, callback) {
    if (someCondition) {
      callback(new Error('Something went wrong'));
      // Other streams in the pipeline stay open!
      return;
    }
    callback(null, chunk);
  }
});

transform.on('error', (error) => {

    // Have to manually clean up streams.
    writableStream.destroy();
    readableStream.destroy();
})

readableStream
  .pipe(transform)
  .pipe(writableStream);

You need to track suck cases when working with pipe and always clean up resources involved in the pipeline manually for every possible error inside of the pipeline.

On the other hand, pipeline the function handles it automatically.

// Using pipeline() - automatic cleanup of all resources
try {
  await pipeline(
    readableStream,
    transform,
    writableStream
  );
} catch (error) {
  console.error('Error:', error);
  // All streams are automatically destroyed
  // No memory leaks or hanging resources
}

Conclusion

Writable streams are responsible for getting data from your Node.js application and transferring it to a destination that often lies outside of your application.

We have several ways of writing data into the writable stream. The most fundamental one is write method that we call in a writable stream and pass the data to be written as an argument.

We can customize the behavior of how exactly write and other methods of a writable stream works by creating a custom stream and extending it from the Writable class.

But even having a custom writable stream won't help us to solve the problem of building complex data flows where we want to transfer data from one source to another without much hustle. That's where pipe and pipeline functions come in handy.

While both pipe and pipeline functions have the same end goal to build pipelines of data, the implementation details are quite different when using them.

In general, pipeline has a much simpler API and handles errors and resources cleanup way better compared to pipe. Most of the time, you should stick with the pipeline.

Writable Streams in Node.js: A Practical Guide

Table of contents

What are Writable Streams?

Writing data to the destination of Writable Stream

Creating a custom writable stream

Building data flows with Writable Streams

`pipe` doesn't have a promise-base API

Poor error handling by `pipe` method

The `pipe` method doesn't clean up resources properly

Conclusion

Subscribe to my newsletter

Pavel Romanov

Pavel Romanov

Writable Streams in Node.js: A Practical Guide

Table of contents

What are Writable Streams?

Writing data to the destination of Writable Stream

Creating a custom writable stream

Building data flows with Writable Streams

pipe doesn't have a promise-base API

Poor error handling by pipe method

The pipe method doesn't clean up resources properly

Conclusion

Subscribe to my newsletter

Pavel Romanov

Pavel Romanov

`pipe` doesn't have a promise-base API

Poor error handling by `pipe` method

The `pipe` method doesn't clean up resources properly