Apache Kafka: Partitions, Consumers, Migrations

Apache Kafka is a cornerstone of modern distributed systems, yet its fundamental concepts can be daunting for newcomers. This guide demystifies Kafka's core—partitions, consumers, and topic migrations—with clear explanations, practical examples, and visual aids to set you on the right path.

What is Apache Kafka?

Apache Kafka is a distributed streaming platform built for high-throughput, fault-tolerant messaging. Imagine a massive, continuous conveyor belt: producers place messages onto it, and consumers retrieve them, each at their own pace.

Core Concepts: Topics & Partitions

Topics: The Communication Channels

In Kafka, a topic is a named channel or stream where producers publish messages and consumers subscribe to read them. Think of a topic as:

A database table: Each message is an immutable, append-only row.
A persistent queue: Unlike traditional queues, Kafka topics are durable and replayable. Messages are not removed after being read, allowing multiple consumers to process the same data independently and at their own pace.

Examples: user-signups, payment-transactions, order-updates

Partitions: The Key to Parallelism

Each Kafka topic is divided into partitions, which are ordered, immutable sequences of messages. Partitions are fundamental to Kafka's scalability and parallelism.

graph TD
    A[Topic: user-events] --> B[Partition 0]
    A --> C[Partition 1]
    A --> D[Partition 2]

    B --> B1[Message 1]
    B --> B2[Message 2]
    B --> B3[Message 3]

    C --> C1[Message 1]
    C --> C2[Message 2]

    D --> D1[Message 1]
    D --> D2[Message 2]
    D --> D3[Message 3]
    D --> D4[Message 4]

Every message within a partition has a unique, sequential offset. Kafka guarantees message order only within a single partition, not across the entire topic.

How Kafka Assigns Messages to Partitions: The Message Key

Producers can include an optional key with each message. This key determines the target partition using a consistent hashing strategy:

With a key: partition = hash(key) % num_partitions
Without a key: Kafka uses a round-robin approach or a custom partitioner.

This mechanism ensures:

All messages with the same key are directed to the same partition.
Message order is preserved only for messages sharing the same key.

Crucial Point: If strict message order is vital for related events (e.g., all actions by a specific user), always use the same key for those messages.

Keyed Message Example

Consider processing user-events using user_id as the key:

graph TD
    A[Producer] -->|user_id: 101| P0[Partition 0]
    A -->|user_id: 102| P1[Partition 1]
    A -->|user_id: 103| P2[Partition 2]
    A -->|user_id: 101| P0
    A -->|user_id: 102| P1
    A -->|user_id: 103| P2

Here, all events for user_id: 101 consistently land in Partition 0, user_id: 102 in Partition 1, and so on. This preserves order for individual users while still achieving parallelism across different users.

Design Tip: Use keys intentionally to group related events and plan your partition count based on expected key distribution.

Consumers and Consumer Groups

Why One Consumer Per Partition?

Kafka's design dictates that only one consumer within a consumer group can read from a given partition at any time. This strict rule is crucial for preserving message order within partitions.

If multiple consumers could read from the same partition concurrently, message ordering would break, especially for keyed messages where sequential processing is critical.

graph TD
    subgraph Partition 0
        M1[Message 1 - user 101]
        M2[Message 2 - user 101]
        M3[Message 3 - user 101]
    end

    subgraph Consumer A
        A1[Receives M1]
        A2[Receives M3]
    end

    subgraph Consumer B
        B1[Receives M2]
    end

    M1 --> A1
    M2 --> B1
    M3 --> A2

Note: The above diagram illustrates a hypothetical scenario that is impossible in Kafka. It demonstrates why Kafka enforces the one-consumer-per-partition rule: to prevent out-of-order processing of messages within a partition.

Kafka's design ensures:

✅ One partition → One active consumer (per group)
✅ Guaranteed in-order delivery for messages within that partition

What is a Consumer Group?

A consumer group is a collection of consumers that cooperate to read data from a topic. Kafka distributes the topic's partitions among the consumers in the group, ensuring each partition is processed by exactly one consumer.

Multiple consumer groups can read the same topic independently, each receiving a full copy of all messages.
Within a single consumer group, partitions are exclusively assigned to individual consumers.

graph TD
    A[Topic: payments<br/>1 Partition] --> B[Partition 0]

    subgraph Consumer Group
        C1[Consumer 1] --> B
        C2[Consumer 2] -.-> |Idle| B
        C3[Consumer 3] -.-> |Idle| B
    end

    style C1 fill:#4CAF50,stroke:#388E3C
    style C2 fill:#9E9E9E,stroke:#616161
    style C3 fill:#9E9E9E,stroke:#616161

Partition Count vs. Consumer Count: Maximizing Parallelism

To fully leverage consumer parallelism, you should aim for:

number_of_partitions >= number_of_consumers_in_group

If you have fewer partitions than consumers, some consumers will remain idle as partitions cannot be shared.
If you have more partitions than consumers, some consumers will process multiple partitions.

Why? Partitions are Kafka's unit of parallelism. More partitions allow for more parallel consumers, leading to higher throughput.

Multiple Partitions = Parallel Consumers

Increasing the number of partitions directly enables parallel processing.

graph TD
    A[Topic: orders] --> B[Partition 0]
    A --> C[Partition 1]
    A --> D[Partition 2]
    A --> E[Partition 3]

    F[4 Partitions] --> A

    subgraph Consumer Group
        C1[Consumer 1] --> B
        C1 --> C
        C2[Consumer 2] --> D
        C2 --> E
    end

    style C1 fill:#4CAF50
    style C2 fill:#4CAF50

This design allows Kafka to scale linearly with the number of partitions.

Rebalancing Consumers

When consumers join or leave a group, Kafka automatically rebalances partition ownership among the remaining or new consumers.

sequenceDiagram
    participant T as Topic (3 partitions)
    participant C1 as Consumer 1
    participant C2 as Consumer 2
    participant C3 as Consumer 3
    participant C4 as Consumer 4

    T->>C1: Assign Partition 0
    T->>C2: Assign Partition 1
    T->>C3: Assign Partition 2
    Note right of C4: Idle - no partitions left

    Note over C1,C3: Rebalancing occurs when consumers join/leave

    C1->>T: Consumer 1 leaves
    T->>C2: Assign Partition 0
    T->>C4: Assign Partition 1
    T->>C3: Keep Partition 2

Impact: Rebalancing can cause temporary processing delays (lag spikes) as partitions are reassigned.

Scaling with Partitions: A Critical Consideration

You Can Only Increase Partitions

Kafka allows you to increase the number of partitions for a topic, but never decrease them. This limitation exists due to:

Offset Mismatch Risks: Decreasing partitions could lead to inconsistencies in message offsets.
Potential Data Loss: Messages might be lost during a partition reduction.
Consumer Confusion: Rebalancing becomes chaotic and unreliable.

# ✅ Increase partitions
kafka-topics --alter --topic user-events --partitions 8 \
  --bootstrap-server localhost:9092

# ❌ This will fail
# kafka-topics --alter --topic user-events --partitions 2

Planning Your Partition Count: Start Small, Scale Up

It's generally best to start with a lower number of partitions and increase them as your needs grow. Starting with too many can introduce unnecessary overhead. Since you can only increase, not decrease, beginning small allows for smoother, iterative scaling.

Monitor throughput and consumer utilization to determine the optimal time to add more partitions. Also consider:

Expected throughput: How much data will flow through this topic?
Number of consumers: How many consumers will process this data?
Future growth: Anticipate your system's scaling needs.

Kafka Topic Migrations: Navigating the Challenges

Moving Kafka topics between clusters or environments presents unique challenges.

The “Unreadable Message” Problem

This common issue often arises with Avro-based data:

graph LR
    A[Source Cluster] --> B[Messages with<br/>Schema ID 42]
    B --> C[Migration Tool]
    C --> D[Target Cluster]
    D --> E[Messages with<br/>Schema ID 42]
    E -.-> F[Schema Registry<br/>ID 42 ≠ Different Schema!]

The problem occurs when the message's embedded schema ID in the new environment points to a different schema than intended, rendering messages unreadable.

Tombstone Messages: Don't Lose Them!

Tombstone messages are special messages with null values used for:

Deleting keys in compacted topics.
Triggering data cleanup.
Representing logical deletions.

Ensure these critical messages are handled correctly during migration to avoid data inconsistencies.

Best Practices for Topic Migration

Test thoroughly: Always test migrations with sample data first.
Schema Registry compatibility: Verify that schema registries in source and target environments are compatible.
Tombstone handling: Confirm your migration strategy correctly handles tombstone messages.
Monitor rebalancing: Keep a close eye on consumer group rebalancing during and after migration.
Tool caution: Use tools like MirrorMaker2 or Confluent Replicator with a deep understanding of their behavior and limitations.

Practical CLI Examples

Scenario 1: Ordered, Low-Volume Events

For topics requiring strict ordering and handling low throughput (e.g., user actions where sequence is critical):

kafka-topics --create --topic user-actions \
  --partitions 1 --replication-factor 3 \
  --bootstrap-server localhost:9092

Outcome: Only one consumer will be active, ensuring strict ordering.

Scenario 2: High Throughput Event Streams

For topics demanding high parallelism and real-time processing (e.g., web events for analytics):

kafka-topics --create --topic web-events \
  --partitions 16 --replication-factor 3 \
  --bootstrap-server localhost:9092

Outcome: Up to 16 consumers can process messages in parallel, ideal for real-time analytics or event sourcing.

Key Takeaways

Partitions = Parallelism: The fundamental unit for scaling Kafka.
One Partition → One Consumer (per group): Ensures in-order processing within a partition.
Partition Count is Immutable Downward: Plan carefully; you can only increase partitions.
Migration is Tricky: Pay close attention to schema compatibility and tombstone messages.
Rebalance = Temporary Lag Spikes: Be aware of processing delays during consumer group rebalancing.

Understanding Kafka’s partition model and consumer mechanics is paramount for building resilient, scalable distributed systems. As the saying goes, "With great partitioning comes great responsibility." 🧠

Final Thoughts

This guide provided a solid foundation in Kafka's core concepts. However, Kafka is a vast ecosystem with many advanced features, including:

Kafka Streams for real-time stream processing
Exactly-once semantics and transactions for data integrity
Schema Registry and various serialization formats
Comprehensive security, monitoring, and operational tooling

Consider this your starting point. Continue exploring Kafka's broader capabilities to unlock its full potential in your distributed applications!

Understanding Apache Kafka: A Beginner's Guide to Partitions, Consumers, and Migrations