Node.js Streaming Basics: Part 1

When you think of streaming from a non-technical point of view, you naturally think of platforms like Netflix or radio programs or podcasts. If you are a developer, especially a backend developer, your mind sometimes wonders how these systems or platforms are built to enable you to stream media content such as videos or audios seamlessly.

I hope to demystify this concept in this article and a few more to come. What we will cover in this article include:

What are streams?
Types of streams in Node.js
A brief tour of Buffer

In order to follow along, I expect that you have a basic understanding of Buffers and Events in Node.js. If you need an article on any of those, feel free to reach out in the comments section.

💡

This article aims to document my learning journey about streaming in Node.js, acknowledging that it is a broad and challenging topic.

What are Streams?

Basically, a stream is a sequence of data chunks made available over time. In the context of Node.js, streams are an alternative technique for accessing data from various sources such as network, files, and more.

In Node.js, a stream provides an interface for streaming data and can be used to handle reading or writing files, network communications, or any kind of end-to-end information exchange in a very efficient way.

With a stream, you handle data in chunks, allowing you to function without needing all the data at once. This is why, when you click on a YouTube video, you can start watching immediately without waiting for the entire video to download, even if it's an 8-hour tutorial.

Types of Streams

💡

Remember that we are discussing streaming within the context of Node.js, right?

When learning about streams, you will often encounter two key terms: source and destination. Keep this in mind.

As far as I know of, Node.js supports four kinds of streams:

Readable streams are streams from which we can read data. Basically, they are data sources. Whenever you use functions like fs.read(), fs.createReadStream(), or http.get(), you are using a readable stream. These functions give you access to data sources from which you can read data in small bits or pieces.
Writable streams, on the other hand, are data destinations. They enable us to write data into sinks, which can be files, network sockets, or any other output destination. When you use writable streams, you can send data in chunks, which is particularly useful for handling large amounts of data efficiently. For example, when you use fs.createWriteStream() to write to a file, you can write data incrementally without loading the entire content into memory.
Duplex streams enable bi-directional flow of data. They are streams that can be read from or written to. This makes them versatile for scenarios where data needs to be sent and received simultaneously.

In Node.js, duplex streams are instances of the stream.Duplex class, which is a subclass of both stream.Readable and stream.Writable. An example of a duplex stream is a TCP socket created using the Node.js net module.

💡

Though I am still aiming to understand networking deeply in the future, I am confident that a TCP socket can receive data from a remote server and send data to that server.
Transform streams are a type of duplex stream. What distinguishes them is their ability to modify the data they receive through their writable interface before sending it out through their readable interface.

One practical application that comes to mind for utilizing the capabilities of a transform stream is compression and decompression.

With a transform stream, you can receive compressed data (writable), decompress it, and then send back the transformed data (readable).

It is important to note that streams in Node.js extend the EventEmitter class. This means that they can emit and listen for events, such as data, end, error, and close, which allows for handling asynchronous data flow efficiently.

💡

If you are not familiar with this class, let me know in the comments section and I will write an article on it.

A Brief Tour of Buffer

A stream returns small chunks of data using a buffer and triggers a callback (the event handler for data) when new chunks are available for processing.

What are Buffers?

Buffers in Node.js are a higher-performance alternative to strings, Mixu's Node Book

The concept of buffers was one of the challenging concepts for me to understand when I first encountered them (another one was APIs). Strangely, I often find it difficult to grasp simple topics while relatively easy to understand complex ones.

Since this is not an article focused on explaining buffers, let's put it simply: a buffer is a temporary storage area in memory with a defined size. You know that you can read from and write to a file, right? A buffer behaves similarly, but while a file is stored on disk, a buffer is stored temporarily in your RAM (or memory).

💡

Remember the definition drawn from Mixu's Node Book above?

Why are Buffers Higher-Performance Alternative to Strings?

Unlike strings, which handles sequences of characters, buffers are sequences of raw memory allocated outside the V8 JavaScript engine. This makes them more efficient for handling binary data.

The reason they have a fixed length is that they do not have an encoding, which makes them more accurate. One of the challenges of text manipulation is encoding, as you have to consider the different encoding schemes.

Strings in JavaScript are sequences of characters, and these characters can be encoded in various ways, such as UTF-8, UTF-16, or ASCII. Each encoding scheme has its own rules for representing characters, which can introduce complexity and potential errors when manipulating text data.

Buffers, on the other hand, are sequences of bytes, and each byte is a raw piece of data. This raw nature allows buffers to be more precise and efficient when dealing with binary data. Since buffers have a fixed length, you know exactly how much memory is allocated, and you can directly access and manipulate the data at the byte level.

💡

But working with buffers is relatively complicated to working with strings...

Conclusion

As a backend engineer, it's essential to learn how to work with large datasets, which often include media files containing binary data. Understanding streaming is crucial for efficiently handling this type of data. I hope this article has effectively explained the concept of streaming in a clear and accessible way, encouraging you to explore the topic further.

If you have any questions, feel free to post them in the comments section, and I will do my best to answer.

Streaming in Nodejs - Part 1

Table of contents