Foundations Beginner 5 min read

Back-of-Envelope Math for System Design

Q: Why are these numbers so approximate?

Because the answer you need is the *order of magnitude*, not the exact figure. The decision 'one box vs sharded cluster' lives between 100 GB and 10 TB - a 5x estimation error never crosses that boundary. The estimate's job is to rule out architectures, not to predict the bill.

Q: Where do the latency numbers come from?

Jeff Dean's 'Numbers Every Programmer Should Know' from 2010, updated for SSDs and 10 GbE. The headline numbers - 1 ns L1, 100 ns RAM, 100 µs SSD random read, 500 µs LAN round-trip, 50 ms cross-region - have stayed within a factor of two for fifteen years. They are the lingua franca of system design rooms.

Q: How do I get good at this fast?

Pick three real services you use (Twitter, Uber, Spotify), guess their daily QPS in 30 seconds, then look up the public number. After ten guesses your intuition for 'busy' vs 'truly massive' becomes calibrated. The interview doesn't need a precise answer; it needs a defensible one.

Q: Does this still matter when you can just measure it?

Yes - because at design time you don't have anything to measure. The estimate decides the *first* version of the system. Once it runs you replace estimates with metrics from chapter 13 ([observability](/system-design/observability-otel-dotnet)), but the architecture is already cast.

How to estimate QPS, storage, bandwidth, and latency budget on a whiteboard. The numbers every system design interview and capacity-planning exercise reuses.

Phùng Anh Tú · May 7, 2026

Table of contents

When does back-of-envelope arithmetic actually save you?
What four numbers should every estimate produce?
What round constants should I memorise?
How does a 30-second estimate flow on a whiteboard?
What does this look like in .NET capacity planning?
Where do back-of-envelope estimates fail?
When should you skip back-of-envelope and just measure?
Where should you go from here?

A senior engineer in a system design interview is given the prompt "design Twitter". They reach for a marker, write a column of four numbers - 100M DAU, 50K QPS, 200 TB/year, p99 < 100 ms - and the rest of the conversation flows from there. That column is back-of-envelope math, and the discipline behind it is what this chapter teaches.

When does back-of-envelope arithmetic actually save you?

Three situations.

First, interview opening. The interviewer expects you to estimate load before drawing boxes. Skipping the estimate signals that you do not know which architecture is overkill. Two minutes of math earn the right to propose a "boring" Postgres-only design, or to defend a sharded Cassandra cluster.

Second, capacity planning at work. Before you spin up the production database tier, you owe the team an answer for "how big". Guessing wastes money up; underspecifying means a 3 AM incident. Math fits in a Slack thread.

Third, architecture review. Someone proposes Kafka for an event stream. Is it warranted? Estimate the QPS - if it is 10/s, no. If it is 100K/s, yes. The estimate ends the debate.

What four numbers should every estimate produce?

Every back-of-envelope exercise lands on the same four numbers, because they map to the same four cost lines on a cloud bill:

QPS at peak (writes vs reads broken out) - drives compute sizing.
Storage in TB at one year (and growth rate) - drives database tier and sharding decision.
Bandwidth in GB/s in and out at peak - drives load balancer and egress cost.
Latency budget in ms broken down per hop - drives architecture choice (cache vs DB, sync vs queue).

The QPS and storage numbers are the most important; bandwidth and latency confirm the design after.

What round constants should I memorise?

Memorise these once and never compute them again:

TIME
1 day            ~ 100,000 seconds  (actually 86,400, round up for headroom)
1 month          ~ 30 days          ~ 2.5M seconds
1 year           ~ 30M seconds

DATA
1 small text row     1 KB         (e.g. tweet, comment, log line)
1 thumbnail image    50 KB
1 photo              500 KB
1 short video        10 MB
1 row in users table 200 bytes

NETWORK
LAN round-trip       0.5 ms
Cross-AZ round-trip  1-2 ms
Cross-region         50-150 ms
Internet round-trip  10-200 ms

DISK / MEMORY
RAM read            100 ns
SSD random read     100 µs
SSD sequential      300 MB/s
HDD random read     10 ms
HDD sequential      100 MB/s

CPU SCALES
1 modern core       100K-1M simple ops/sec
1 ASP.NET Core box  ~10K simple QPS, ~1K with EF Core query
1 Postgres node     5K-20K writes/sec, 30K-100K reads/sec
1 Redis node        100K-1M ops/sec
1 Kafka broker      ~1M msg/sec at default settings

These are generous round numbers; real measurements are usually within 2x. The 100K seconds/day trick is the most useful: 1000 RPS becomes 100M req/day in your head with no calculator.

How does a 30-second estimate flow on a whiteboard?

Take Twitter as the canonical example. Interviewer says "100M daily active users, designs the timeline". The mental flow:

flowchart LR
    DAU[100M DAU] --> QPS[Peak QPS<br/>= 100M / 100k s<br/>* 5 peak factor<br/>= 5K req/s reads<br/>= 500 req/s writes]
    QPS --> Storage[Storage<br/>= 500 writes/s<br/>* 100k s/day<br/>* 1 KB<br/>* 365 days<br/>= 18 TB/year]
    Storage --> BW[Bandwidth<br/>= 5K read/s<br/>* 50 KB tweet+meta<br/>= 250 MB/s out]
    BW --> Lat[Latency<br/>p99 100 ms<br/>= 50ms cache<br/>+ 30ms net<br/>+ 20ms render]

That is the 30-second budget. The rest of the conversation - which database, sharding, caching - all reference back to those four numbers. If somebody proposes a single Postgres node, you can say "500 writes/s fits one box but 18 TB/year forces partitioning by year" and the design follows.

What does this look like in .NET capacity planning?

Translate to dollars by working backward from the numbers above. Suppose you are designing the URL shortener chapter:

// Estimated workload, captured as constants for sanity-check tests:
public static class CapacityEstimate
{
    public const int    DailyShortens     = 1_000_000;       // 1M new URLs/day
    public const int    DailyRedirects    = 100_000_000;     // 100M reads/day (100:1 read/write)
    public const int    PeakRedirectsPerSec = DailyRedirects / 100_000 * 5; // 5K req/s peak
    public const int    AvgUrlBytes       = 200;             // short + long + meta
    public const long   StorageOneYearGb  = (long)DailyShortens * 365 * AvgUrlBytes / 1_000_000_000; // 73 GB
    public const int    CacheHitRatePct   = 90;              // hot 1% of URLs serve 90% of reads
    public const int    DbReadsPerSec     = PeakRedirectsPerSec * (100 - CacheHitRatePct) / 100; // 500 RPS
}

Now the architecture choices have evidence:

5K reads/s peak - one ASP.NET Core box can handle it; two for HA.
500 DB reads/s after 90% cache hit rate - one Postgres node with no special tuning.
73 GB/year storage - well inside any free database tier.
No sharding needed for at least three years of growth.

The whole design is one Postgres + one Redis + two stateless web nodes, and you can defend it numerically. Without the estimate, the same exercise produces an over-engineered Cassandra-and-Kafka nightmare.

Where do back-of-envelope estimates fail?

Two failure modes.

First, bursty workloads. The 5x peak factor in the Twitter example assumes diurnal traffic. If your workload spikes 100x for a flash sale, the average QPS is meaningless - design for the peak. Chapter 14 (rate limiting) shows how to cap the burst when you cannot scale to absorb it.

Second, storage that grows faster than write rate. Photos, videos, ML embeddings - the per-row size is large and you measure storage in bytes per second, not rows per second. Update the constant and the estimate works again, but a "small" service with 1M users uploading a 5 MB image each is still a 5 TB problem.

When should you skip back-of-envelope and just measure?

When the system already exists and you are tuning it. Real metrics from chapter 13 (observability) beat estimates by an order of magnitude. The estimate's job is to design the first version - to rule out clearly wrong architectures - not to compete with telemetry. Once the system is in production, the estimate becomes a sanity check on the dashboards, not the source of truth.

Where should you go from here?

Next chapter: CAP and consistency - the second vocabulary, the one that decides whether you can use one database or need to think about quorums. After that you have the full toolkit to start choosing concrete .NET building blocks.

Frequently asked questions

Why are these numbers so approximate?

Because the answer you need is the order of magnitude, not the exact figure. The decision 'one box vs sharded cluster' lives between 100 GB and 10 TB - a 5x estimation error never crosses that boundary. The estimate's job is to rule out architectures, not to predict the bill.

Where do the latency numbers come from?

Jeff Dean's 'Numbers Every Programmer Should Know' from 2010, updated for SSDs and 10 GbE. The headline numbers - 1 ns L1, 100 ns RAM, 100 µs SSD random read, 500 µs LAN round-trip, 50 ms cross-region - have stayed within a factor of two for fifteen years. They are the lingua franca of system design rooms.

How do I get good at this fast?

Pick three real services you use (Twitter, Uber, Spotify), guess their daily QPS in 30 seconds, then look up the public number. After ten guesses your intuition for 'busy' vs 'truly massive' becomes calibrated. The interview doesn't need a precise answer; it needs a defensible one.

Does this still matter when you can just measure it?

Yes - because at design time you don't have anything to measure. The estimate decides the first version of the system. Once it runs you replace estimates with metrics from chapter 13 (observability), but the architecture is already cast.