Latency vs Throughput: The Hidden Trade-Off in System Design

When measuring system efficiency, two metrics matter most: latency and throughput.
Latency: Speed of a Single Operation
Definition: The time taken to complete one task (e.g., a database query).
Example: If an API call takes 200ms, that’s its latency.
Goal: Lower latency means faster responses.
Throughput: Total Operations Over Time
Definition: The number of operations a system can handle per second.
Example: A server processing 1,000 requests per second has high throughput.
Goal: Maximize throughput to handle more users.
The Trade-Off
High throughput often increases latency (e.g., batch processing slows individual requests).
Low latency may reduce throughput (e.g., real-time systems prioritize speed over volume).
Best Practice
For most systems, optimize for reasonable latency while maximizing throughput.
Example: A search engine should return results quickly (low latency) but also handle millions of queries per second (high throughput).
Subscribe to my newsletter
Read articles from Dichan Shrestha directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
