Kafka Internals & Underappreciated Powers ⚡️

Mustakim KhatikMustakim Khatik
3 min read

Kafka has evolved from a humble messaging system to a cornerstone of modern data architectures. While many engineers utilize Kafka daily, few truly grasp the engineering brilliance hiding beneath its seemingly simple interface.

This blog dives into the technical foundations that make Kafka exceptional and explores capabilities that extend far beyond basic publish/subscribe messaging.

Kafka Is More Than a Messaging System

Kafka is distributed, high throughput, low latency system, used for -

Message Brokering: Kafka acts as a broker between producer and consumer application that facilitates messaging, Millions of messages per second

Event Storage: Kafka retains messages for longer period of time thanks to its log-based storage architecture that stores the logs in broker instance based on topic and partitions, by default retention period is 7 days and it is configurable!

Real time stream processing: Kafka Stream ( a Component of Kafka ) enables real time decision making, that consumes the message and applies business logic

Data Integration Pipeline: Kafka Connect can be used to connect with data storages and Kafka Stream could be used to transform the data and then pushed to any distributed search & analytical engine

Example
Kafka Connect can connect with MariaDB (Source connector) to take the data and Kafka Stream could be used to transform this data, and then push refactored data to ElasticSearch (Sink connector).

Event Sourcing: Kafka data is immutable once committed that ensures integrity. So it can be used as Source of truth!

Zero Copy Principle: Kafka’s Performance Secret

Typically, most of the messaging systems, Read the data from disk into kernel space buffer, copy kernel buffer to application buffer then Application process this data then copy data from Application buffer back to kernel socket buffer and in the last, sends kernel socket buffer to network interface.

How Kafka achieves Zero copy:

Direct memory transfer from disk to network socket

Uses sendfile() system call to bypass application heap entirely

Uses MemoryMap files for efficient producer and consumer operation!

Partitioning: The Key To Kafka’s Scalability

Kafka’s Partitioning model is foundation of its horizontal Scalability

Topics splits into multiple partitions that is distributed across multiple brokers and each partition is an ordered, immutable sequence of messages. Partitions enables parallel processing by multiple consumers

Partition leader & follower provides fault tolerance, suppose KRaft or Zookeeper detects a failure, one of in in-sync follower is promoted to leader, Producer/consumer are automatically directed to new elected leader, and once last leader come back it may join as follower.

This ensures:

No data loss
No downtime (if ISR is healthy)

Consumer Group: Underappreciated Load Balancer

Consumer group is collection of consumers working together to consume data from Kafka Topic in parallel.

This ensures:

  • Equal (ideally) distribution of partitions among the consumer of same group.

  • Dynamic rebalancing when any consumer join or leave, this ensures load balancing within consumer group

  • Automatic recovery from consumer failure

  • Offset management: for exactly-once processing guarantees

Conclusion:

Kafka's power extends far beyond its reputation as a messaging system. From its zero-copy architecture to its sophisticated partitioning model, Kafka represents a paradigm shift in how we think about data movement and processing. By appreciating these underappreciated aspects, engineers can leverage Kafka not just as a message broker, but as a comprehensive data platform that handles everything from real-time event processing to state management.

0
Subscribe to my newsletter

Read articles from Mustakim Khatik directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Mustakim Khatik
Mustakim Khatik

Go Backend Developer @NPCI