Latency vs Throughput

Kumar Rishav

3 min read

Kumar Rishav

·

3 min read

Table of contents

☕ Understanding Latency vs Throughput in Real-World Systems

☕ Understanding Latency vs Throughput in Real-World Systems

Whenever I talk to devs diving into distributed systems or backend engineering, one of the first conceptual roadblocks is understanding latency and throughput — and more importantly, how to balance them.

So here’s how I’d explain it if we were at a coffee shop, chatting over a system design whiteboard napkin.

♻ What is Latency?

Latency is the time it takes to complete one request.

Think of it as the delay between:

You hitting “Submit Order” on Amazon
And the page confirming: “Your order is placed!”

🕒 Low latency = fast response
😩 High latency = waiting too long

Unit: milliseconds (ms)

📆 What is Throughput?

Throughput is the number of requests your system can handle per second/minute.

Think of it like how many coffee orders a café can serve per hour.

📈 High throughput = handling lots of users
📉 Low throughput = bottlenecks, queue build-up

Unit: requests/sec, events/min

⚖️ Latency vs Throughput: Not Always Best Friends

Here’s where the balance gets tricky:

If you optimize for latency (fast responses), you might need to reduce concurrent load = lower throughput
If you optimize for throughput (handle more users), you may need to batch or queue = increases latency

☕ Coffee Shop Analogy

Scenario	Latency	Throughput
One customer gets coffee instantly	✅ Low	❌ Low
Barista makes 5 coffees in a batch	❌ Higher	✅ High
Everyone queued for single machine	😩 Both suffer

💻 Real-World Examples

🚀 Low Latency Systems:

Payment confirmation
Live chat
Gaming servers
High-frequency trading

📊 High Throughput Systems:

Analytics pipelines
Log processors
E-commerce order exports
Stream aggregators (Kafka consumers, Flink jobs)

🧠 So... How Do Big Tech Companies Handle It?

They architect their systems to decouple latency-sensitive and throughput-heavy workloads.

Here’s how:

✅ Techniques Big Companies Use:

Strategy	What It Does
Auto-scaling	Add more servers when load increases
Load balancing	Distributes requests evenly
CDNs	Serve cached content near the user
Async Processing	Offload slow tasks (like email, logs)
Data Compression	Reduce network transfer time
Queueing Systems	Protect latency paths (use Kafka, SQS)

📆 In Code Terms:

@GetMapping("/api/order-summary")
public Order getSummary() {
    // Fetch a lightweight, precomputed result (like from cache or a small DB read)
}

@PostMapping("/api/notify-user")
public void notifyUser() {
    // Send to Kafka — async — increases throughput
}

🧪 TL;DR

Metric	Think...	Best for...
Latency	“How fast is my one request?”	Real-time user experience
Throughput	“How many requests can I handle?”	Large-scale processing, batching

⚖️ The art of system design is figuring out where you can afford latency and where you must prioritize throughput — or vice versa.

0

Subscribe to my newsletter

Read articles from Kumar Rishav directly inside your inbox. Subscribe to the newsletter, and don't miss out.

System Design Backend Development distributed systems scalability Performance Optimization software architecture Java Springboot

Written by

Kumar Rishav

Kumar Rishav

Kumar Rishav