Introduction

System design isn’t just about drawing boxes and arrows or mentioning cool tech stacks like Redis, Kafka, or MongoDB. Before thinking about which technology to use or which database to pick, there is something far more crucial — knowing if your design will even work at the scale you want. This is where Back-of-the-Envelope Estimation (BOE) comes into play.

The term “Back-of-the-Envelope” signifies making quick and rough calculations — the kind you could scribble on the back of an envelope or a sticky note. These rough calculations are not meant to give exact numbers but to provide an intuition about the order of magnitude — are you talking about kilobytes or terabytes? Do you need a single server or a whole data center?

This blog covers this fundamental system design skill in depth.

Why Back-of-the-Envelope Estimation is Important

Imagine you’re designing Instagram and someone says, “We need to store 5 million new photos daily.” Without BOE, you have no clue if 1 server is enough, or 100, or 10,000. You’ll just be guessing technologies and architecture blindly.

Back-of-the-envelope estimation allows you to:

Identify potential bottlenecks early.

Maybe your system needs a CDN for media or a cache for user profiles — BOE tells you why.
Avoid under or over-engineering.

You don’t need to shard the database if one server can handle it — BOE clarifies this.
Quickly discard infeasible ideas.

If you realize you need 1 PB of memory, you’ll switch to disk-based storage or rethink the feature.
Show the interviewer (or your team) that you deeply understand scale, latency, and cost.

Key Estimation Concepts Explained in Detail

1. Data Size Estimation

To estimate storage or bandwidth, you need to understand how data size is measured:

1 Kilobyte (KB) = 1,024 bytes
1 Megabyte (MB) = 1,024 KB
1 Gigabyte (GB) = 1,024 MB
1 Terabyte (TB) = 1,024 GB
1 Petabyte (PB) = 1,024 TB

A simple image uploaded to a photo-sharing service may be 2 MB. If 1 million users upload 2 photos daily, you generate 4 Terabytes (TB) of data per day. Over a year, that’s 1.4 Petabytes — massive storage needs.

Similarly, text data (like tweets or posts) is tiny compared to media. A single tweet with 280 characters might occupy 300 bytes, while a photo or video would take millions of bytes.

2. Network Bandwidth Estimation

Network capacity is critical. If your service needs to serve 10 million users streaming 4 Mbps videos simultaneously, that requires 40 Terabits per second — an impossible load for a single server or even a single data center.

Even small requests add up: A mobile app making 5 tiny 2KB requests per second for 1 million users will generate 10 GB per second of outgoing data.

3. CPU Usage Estimation

Let’s say each HTTP request requires 10 milliseconds of CPU time. If you expect 10,000 requests per second (QPS), you need:

10,000 QPS \ 10ms = 100,000 ms = 100 seconds of CPU time every second.*

Since each CPU core can only process 1 second of work per second, you need at least 100 CPU cores running in parallel to keep up — and that’s without considering kernel overhead or I/O waits.

4. Latency Awareness

Latency matters greatly in user experience. Some basic latency numbers every engineer should memorize:

Reading from L1 CPU cache: less than 1 nanosecond
Reading from main memory (RAM): ~100 nanoseconds
Fetching from SSD: ~100 microseconds
Fetching from disk: ~10 milliseconds
Data travel between US coasts: ~100 milliseconds

A single disk seek (10 ms) is 10,000 times slower than accessing L1 cache! This is why high-performance systems cache aggressively and reduce disk accesses.

5. Availability and the Concept of “Nines”

High availability is measured in nines:

99% (two nines) = ~3.65 days of downtime per year.
99.9% (three nines) = ~8.76 hours downtime/year.
99.99% (four nines) = ~52.6 minutes downtime/year.
99.999% (five nines) = ~5.26 minutes downtime/year.

Achieving more nines requires costlier, complex solutions like load balancing, redundant servers, failovers, and distributed storage.

Real Example: Estimating a Twitter-like Service

Problem Statement:

Build a Twitter-like service to handle 300 million users.

Assumptions:

300 million monthly active users (MAU)
50% daily active users (DAU) = 150 million
Each user tweets twice a day
Each tweet = 300 bytes
10% tweets include media (average size 1 MB)
Each user loads timeline 5 times a day
Tweets are read 10× more than they are written (read amplification)

Write QPS Calculation:

Total tweets/day: 150M users × 2 tweets/day = 300M tweets/day

Write QPS: 300M tweets/day ÷ 86,400 sec/day ≈ 3,472 writes/sec (QPS)

Peak (double): Approx 7,000 writes/sec

Storage Estimation:

Text Tweets: 300M × 300 bytes = 90 GB/day
Media Tweets: 10% of 300M tweets = 30M tweets with media 30M × 1 MB = 30 TB/day
Total storage in 5 years (media only): 30 TB/day × 365 days × 5 years = 54,750 TB ≈ 54.75 PB

Read QPS Calculation:

Each DAU checks timeline 5 times a day:

150M users × 5 = 750M timeline loads/day

750M ÷ 86,400 = ~8,680 reads/sec (QPS)

Bandwidth Estimation:

Media delivery requires huge bandwidth.
Suppose average media file = 1 MB, 30M media tweets/day.

If 10% of DAUs load media daily:

150M × 10% × 1 MB = 15 TB/day = 174 MB/sec

Conclusion

Back-of-the-envelope estimation is not about accuracy; it’s about clarity. Before designing distributed storage, load balancers, or messaging queues, always check if the system scale makes sense.

Great engineers and interviewees use BOE to:

Spot bottlenecks before they occur.
Argue design choices with data, not gut feel.
Think at the magnitude of terabytes, millions, or billions.

Doing this well sets you apart in interviews and real-world system design.

Back-of-the-Envelope Estimation