Kafka's Design - a Masterclass in Systems Engineering


Kafka isn’t just “a message queue.” It’s a distributed, high-throughput commit log that moves ridiculous amounts of data with remarkable efficiency.
When I started digging into its internals, I expected to find some clever Java code.
What I found instead was a deep appreciation for operating system fundamentals — the kind of design where you can tell the authors knew that the fastest code is the code you don’t run.
Zero-Copy networking with sendfile()
Kafka brokers often need to send large chunks of log data to consumers — potentially gigabytes at a time.
Instead of reading the data into JVM memory and then writing it back out to the socket, Kafka uses the Linux sendfile()
syscall.
sendfile()
moves data directly from the OS page cache to the network socket, skipping user-space entirely.
Benefits:
No extra memory copies between kernel and user space.
Fewer syscalls (
read
+write
replaced with one).Lower CPU usage so the broker can handle more clients.
This is the same trick that makes web servers like Nginx fly — and Kafka applies it to event streaming.
The Power of Sequential Disk Access
Most people think “disk is slow” — but that’s only true for random access.
Sequential reads and writes, especially on modern SSDs and even HDDs, can hit speeds rivaling memory bandwidth.
Kafka’s log-structured storage model means writes are always appended at the end of the file. Consumers, too, read sequentially most of the time.
Why this matters:
The OS can prefetch upcoming data into cache (read-ahead).
The page cache gets heavily reused for hot segments.
Disk head movement (on spinning drives) is minimized.
Leaning on the OS page cache
Rather than implementing its own in-JVM caching layer for disk data, Kafka fully embraces the OS’s page cache.
The reasoning is elegant: the kernel already knows what’s been recently read or written, and it’s very good at keeping that hot in memory.
Advantages:
Avoids duplicating data in the JVM heap.
Lets Kafka use RAM as a shared resource between all processes.
Keeps garbage collection overhead lower.
This is why a Kafka broker can restart and immediately start serving data from memory — the page cache persists across process restarts.
Durability: Disk as a long term store
Many in-memory systems (like Redis in pure RAM mode) offer lightning-fast performance, but their retention window is limited by available memory.
Kafka’s choice to store data on disk means:
You can retain data for days, weeks, or even months, depending on disk capacity.
Consumers that come online late can still fetch old messages.
System restarts don’t wipe out history.
Paired with replication, this makes Kafka not just a fast message broker but also a reliable data store for streaming.
Bypassing the JVM’s buffered streams for disk writes
Kafka doesn’t use the JVM’s default I/O abstractions for writing to disk.
Why? Because those abstractions often introduce extra buffering and memory copies, and they don’t give fine-grained control over how data lands on disk.
Instead, Kafka uses Java NIO’s FileChannel, which allows writing directly to the file system and integrating seamlessly with sendfile()
.
It’s closer to the metal — fewer middlemen between the broker and the OS.
Other notable tricks
Batching everywhere — Kafka batches messages not only on the producer side but also in disk writes and network sends. This amortizes syscall overhead.
Replication pipeline — Leaders and followers exchange data in an efficient streaming fashion, avoiding unnecessary disk flushes when possible.
Retention by segment files — Logs are split into segment files so old data can be deleted just by dropping a file, no in-place cleanup needed.
Why This Matters
Kafka isn’t “fast because it’s written in Java.”
It’s fast because it respects the hardware. It understands that modern disks, CPUs, and OS kernels have their own strengths — and it designs around them.
Instead of fighting the JVM’s limitations, Kafka sidesteps them when needed. Instead of building complex caching layers, it uses the OS’s. Instead of shuffling bytes in user-space, it lets the kernel do the lifting.
It’s a reminder that high performance often comes from knowing what not to do.
You can refer to kafka’s design implementation here.
Subscribe to my newsletter
Read articles from Khushal Agrawal directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
