How Apache Kafka works under the hood

At first glance, Kafka might seem like just another messaging queue, but beneath the surface, it’s a highly optimized, high-performance distributed system designed for real-time data processing. In this blog—the first in a deep-dive series—I’ll share insights I’ve gained from working closely with Kafka’s internals every day as a contributor to the open-source community.

⚡ A Quick Backstory: From Logging to Streaming

Kafka was originally developed at LinkedIn to handle high-throughput activity data and operational metrics. Fun fact: it was named after Franz Kafka because its creators liked the idea of a system that could "write a lot"—and, metaphorically, handle complexity.

Over a decade later, Kafka has evolved into the de facto backbone of modern data infrastructure—used by banks, airlines, tech giants, and even the New York Times to power everything from fraud detection to real-time news delivery.

🧱 The Core Architecture: Kafka’s Building Blocks

Having worked on Kafka internals for over a year now as part of my journey at Confluent and as an open-source contributor, I’ve come to appreciate the elegance (and clever hacks!) in its architecture. Let’s walk through the core components that make Kafka what it is.

🔹 Broker

A Kafka broker is essentially a stateless server that handles all read and write requests. Brokers maintain partition data and handle replication, serving as Kafka’s workhorses.

🧠 Real-world insight: A single broker can handle millions of messages per second. Kafka’s architecture ensures this horizontal scalability by cleanly separating compute from storage.

🔹 Topic & Partition

A topic is a named stream of records.
Each topic is split into partitions, which Kafka uses to scale horizontally and parallelize processing.

Partitions are the fundamental unit of Kafka’s data distribution and fault tolerance. Behind the scenes, they are just append-only log files stored on disk—surprisingly simple for something so powerful.

💾 Storage: Append-Only Logs and OS Page Cache Wizardry

One of the most underrated aspects of Kafka is how efficiently it uses the underlying filesystem. Each partition is stored as a directory with log segments, which are immutable files with a configurable size (e.g., 1 GB). New messages are simply appended. Kafka relies heavily on the OS page cache, which means that even without a fancy in-memory cache layer, it can serve reads extremely fast.

🧠 Behind the scenes: Kafka’s "zero-copy" mechanism uses sendfile() system calls to move bytes directly from disk to network sockets, skipping user-space memory. This is part of what makes it really fast.

🔁 Replication & High Availability

Kafka ensures durability and high availability using replication. Each partition has:

One leader (handles all reads and writes)
Multiple followers (replicate data from the leader)

Only those followers that are fully caught up with the leader’s latest data are considered part of the in-sync replicas (ISRs). If the leader fails, Kafka automatically elects a new leader from the ISR set, ensuring continued availability without risking data loss. Replication helps Kafka tolerate broker failures gracefully, and the ISR mechanism ensures that only safe, consistent replicas are promoted during failover. This design enables Kafka to deliver high throughput with strong reliability guarantees in distributed environments.

🔄 The Message Lifecycle: From Producer to Consumer

Here's what happens when a message enters Kafka:

The Producer sends data to a topic.
Partitioner determines which partition it goes to (by key or round-robin).
Broker appends it to the partition’s log.
Replication happens in parallel.
Consumer reads from the log at its own pace using offsets.

🎯 This decoupled, pull-based model (unlike push-based systems) is part of what makes Kafka so resilient and flexible.

✨ Final Thought

Kafka is one of those rare systems that balances brutal performance requirements with elegant simplicity. I’ve been fortunate to contribute to it, and I’m excited to share more of what I’ve learned and built along the way.

Stay tuned—and feel free to reach out if you’re also diving deep into Kafka’s internals or distributed systems in general.

How Apache Kafka Works Under the Hood