Latency vs Throughput

☕ Understanding Latency vs Throughput in Real-World Systems
Whenever I talk to devs diving into distributed systems or backend engineering, one of the first conceptual roadblocks is understanding latency and throughput — and more importantly, how to balance them.
So here’s how I’d explain it if we were at a coffee shop, chatting over a system design whiteboard napkin.
♻ What is Latency?
Latency is the time it takes to complete one request.
Think of it as the delay between:
You hitting “Submit Order” on Amazon
And the page confirming: “Your order is placed!”
🕒 Low latency = fast response
😩 High latency = waiting too long
Unit: milliseconds (ms)
📆 What is Throughput?
Throughput is the number of requests your system can handle per second/minute.
Think of it like how many coffee orders a café can serve per hour.
📈 High throughput = handling lots of users
📉 Low throughput = bottlenecks, queue build-up
Unit: requests/sec, events/min
⚖️ Latency vs Throughput: Not Always Best Friends
Here’s where the balance gets tricky:
If you optimize for latency (fast responses), you might need to reduce concurrent load = lower throughput
If you optimize for throughput (handle more users), you may need to batch or queue = increases latency
☕ Coffee Shop Analogy
Scenario | Latency | Throughput |
One customer gets coffee instantly | ✅ Low | ❌ Low |
Barista makes 5 coffees in a batch | ❌ Higher | ✅ High |
Everyone queued for single machine | 😩 Both suffer |
💻 Real-World Examples
🚀 Low Latency Systems:
Payment confirmation
Live chat
Gaming servers
High-frequency trading
📊 High Throughput Systems:
Analytics pipelines
Log processors
E-commerce order exports
Stream aggregators (Kafka consumers, Flink jobs)
🧠 So... How Do Big Tech Companies Handle It?
They architect their systems to decouple latency-sensitive and throughput-heavy workloads.
Here’s how:
✅ Techniques Big Companies Use:
Strategy | What It Does |
Auto-scaling | Add more servers when load increases |
Load balancing | Distributes requests evenly |
CDNs | Serve cached content near the user |
Async Processing | Offload slow tasks (like email, logs) |
Data Compression | Reduce network transfer time |
Queueing Systems | Protect latency paths (use Kafka, SQS) |
📆 In Code Terms:
@GetMapping("/api/order-summary")
public Order getSummary() {
// Fetch a lightweight, precomputed result (like from cache or a small DB read)
}
@PostMapping("/api/notify-user")
public void notifyUser() {
// Send to Kafka — async — increases throughput
}
🧪 TL;DR
Metric | Think... | Best for... |
Latency | “How fast is my one request?” | Real-time user experience |
Throughput | “How many requests can I handle?” | Large-scale processing, batching |
⚖️ The art of system design is figuring out where you can afford latency and where you must prioritize throughput — or vice versa.
Subscribe to my newsletter
Read articles from Kumar Rishav directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
