Latency vs Throughput: The Hidden Trade-Off in System Design

Dichan Shrestha

Dichan Shrestha

1 min read

Dichan Shrestha

Dichan Shrestha

·

1 min read

When measuring system efficiency, two metrics matter most: latency and throughput.

Latency: Speed of a Single Operation

Definition: The time taken to complete one task (e.g., a database query).
Example: If an API call takes 200ms, that’s its latency.
Goal: Lower latency means faster responses.

Throughput: Total Operations Over Time

Definition: The number of operations a system can handle per second.
Example: A server processing 1,000 requests per second has high throughput.
Goal: Maximize throughput to handle more users.

The Trade-Off

High throughput often increases latency (e.g., batch processing slows individual requests).
Low latency may reduce throughput (e.g., real-time systems prioritize speed over volume).

Best Practice

For most systems, optimize for reasonable latency while maximizing throughput.
Example: A search engine should return results quickly (low latency) but also handle millions of queries per second (high throughput).

0

Subscribe to my newsletter

Read articles from Dichan Shrestha directly inside your inbox. Subscribe to the newsletter, and don't miss out.

latency throughput System Design

Written by

Dichan Shrestha

Dichan Shrestha

Dichan Shrestha