Latency vs Throughput

Kumar RishavKumar Rishav
3 min read

☕ Understanding Latency vs Throughput in Real-World Systems

Whenever I talk to devs diving into distributed systems or backend engineering, one of the first conceptual roadblocks is understanding latency and throughput — and more importantly, how to balance them.

So here’s how I’d explain it if we were at a coffee shop, chatting over a system design whiteboard napkin.


♻ What is Latency?

Latency is the time it takes to complete one request.

Think of it as the delay between:

  • You hitting “Submit Order” on Amazon

  • And the page confirming: “Your order is placed!”

🕒 Low latency = fast response
😩 High latency = waiting too long

Unit: milliseconds (ms)


📆 What is Throughput?

Throughput is the number of requests your system can handle per second/minute.

Think of it like how many coffee orders a café can serve per hour.

📈 High throughput = handling lots of users
📉 Low throughput = bottlenecks, queue build-up

Unit: requests/sec, events/min


⚖️ Latency vs Throughput: Not Always Best Friends

Here’s where the balance gets tricky:

  • If you optimize for latency (fast responses), you might need to reduce concurrent load = lower throughput

  • If you optimize for throughput (handle more users), you may need to batch or queue = increases latency


☕ Coffee Shop Analogy

ScenarioLatencyThroughput
One customer gets coffee instantly✅ Low❌ Low
Barista makes 5 coffees in a batch❌ Higher✅ High
Everyone queued for single machine😩 Both suffer

💻 Real-World Examples

🚀 Low Latency Systems:

  • Payment confirmation

  • Live chat

  • Gaming servers

  • High-frequency trading

📊 High Throughput Systems:

  • Analytics pipelines

  • Log processors

  • E-commerce order exports

  • Stream aggregators (Kafka consumers, Flink jobs)


🧠 So... How Do Big Tech Companies Handle It?

They architect their systems to decouple latency-sensitive and throughput-heavy workloads.

Here’s how:

✅ Techniques Big Companies Use:

StrategyWhat It Does
Auto-scalingAdd more servers when load increases
Load balancingDistributes requests evenly
CDNsServe cached content near the user
Async ProcessingOffload slow tasks (like email, logs)
Data CompressionReduce network transfer time
Queueing SystemsProtect latency paths (use Kafka, SQS)

📆 In Code Terms:

@GetMapping("/api/order-summary")
public Order getSummary() {
    // Fetch a lightweight, precomputed result (like from cache or a small DB read)
}

@PostMapping("/api/notify-user")
public void notifyUser() {
    // Send to Kafka — async — increases throughput
}

🧪 TL;DR

MetricThink...Best for...
Latency“How fast is my one request?”Real-time user experience
Throughput“How many requests can I handle?”Large-scale processing, batching

⚖️ The art of system design is figuring out where you can afford latency and where you must prioritize throughput — or vice versa.

0
Subscribe to my newsletter

Read articles from Kumar Rishav directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Kumar Rishav
Kumar Rishav