Ordering Guarantees in Apache Kafka Producers

In distributed systems, message order can make or break correctness for certain use cases—especially financial transactions, audit logs, or real-time event tracking. Kafka helps maintain order in a predictable and efficient way, but there are trade-offs and nuances every developer should understand.
Kafka’s Message Ordering Model
Kafka guarantees message ordering within a partition. When a producer sends messages in a defined order to the same partition, the broker writes them in that same order, and the consumer reads them in that same order.
Why per-partition?
Because partitions are Kafka’s unit of parallelism and storage. By scoping ordering guarantees to a partition, Kafka balances order and scalability.
When Does Order Matter?
If your use case requires reconstructing a timeline (e.g., bank debits/credits), processing out-of-order messages can lead to data corruption or inconsistencies. In such scenarios, preserving strict message order is non-negotiable.
The Catch: Retries & In-flight Requests
Kafka producers are typically configured for resilience, which includes retries in case of transient failures (like a temporarily unavailable broker). But this introduces a subtle risk:
⚠️ The Ordering Problem
If the producer has:
retries > 0
max.in
.flight.requests.per.connection > 1
Then this can happen:
Producer sends Batch A and Batch B to the same partition.
Broker fails to write Batch A but successfully writes Batch B.
Producer retries Batch A and it succeeds.
🚨 Result: Batch B appears before Batch A in the partition.
➡️ Order is violated.
Ensuring Order: The Reliable Approach
To guarantee message order even with retries enabled:
props.put(ProducerConfig.RETRIES_CONFIG, 5);
props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 1);
This ensures:
If a message batch fails and is retried,
No new messages are sent in the meantime,
Thus, order is preserved.
✅ Correctness over Throughput
The Trade-off: Throughput Penalty
Limiting max.in
.flight.requests.per.connection
to 1 significantly reduces parallelism. For high-throughput workloads where order is not crucial, this may not be desirable.
🎯 Guideline: Only enforce strict ordering when business logic absolutely requires it. Otherwise, tune for throughput.
Summary
Kafka’s partition-level ordering is strong and reliable. But when producers introduce retries and allow concurrent in-flight requests, order violations can occur. You can prevent this by trading off throughput and enforcing a one-at-a-time message delivery approach.
✅ Set retries > 0
✅ Set max.in
.flight.requests.per.connection = 1
📉 Expect lower throughput
🎯 But guaranteed order.
Order or throughput: the choice depends on what your system values most.
Subscribe to my newsletter
Read articles from Vijay Belwal directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
