System Design Interview Cheat Sheet: Numbers, Formulas, and Patterns

June 6, 20269 min read
system-designinterview-prepcareeralgorithms
System Design Interview Cheat Sheet: Numbers, Formulas, and Patterns
TL;DR
  • L1 cache is 200x faster than RAM; RAM is 1,000x faster than SSD; a cross-continent round trip (150ms) is 300x slower than a same-datacenter hop (500µs).
  • Three capacity estimation formulas cover every back-of-envelope problem: QPS = (DAU × requests/day) / 86,400; storage/year = QPS × payload × 31.5M; bandwidth (Gbps) = QPS × bytes × 8 / 1B.
  • Component throughput ceilings: Redis handles 100K–500K ops/sec; PostgreSQL starts struggling past 10K write QPS; Cassandra scales writes horizontally with no hard ceiling.
  • Default to SQL; switch to NoSQL only when write volume outpaces a single primary instance or schema flexibility is the explicit constraint.
  • Cache-aside is the default caching pattern: check cache first, fall back to the database on a miss, write the result to cache — best for read-heavy workloads.
  • Scaling sequence in order: single server → load balancer → read replicas → cache → sharding → CDN → message queue → extract services.
  • 80–90% of systems are read-heavy: caching and read replicas first; write-heavy systems (logging, IoT) want queues and append-optimized storage.

Thirty minutes before your system design interview, you don't want an essay. You want the numbers already loaded into working memory, the formulas ready to run, and the decision rules sitting at the front of your brain.

This is that cheat sheet.

No theory, no deep dives. Just the facts and frameworks you will reach for in the next forty-five minutes. You already know this stuff. You just need it on the surface.

The Latency Table Is Not Optional

Every experienced interviewer knows these numbers cold. If you don't, you can't sanity-check your estimates on the fly, and your "back of the envelope" looks like a guess.

In the real world you'd have a browser tab open for this. You do not have that tab right now.

The key insight isn't the exact numbers. It's the orders of magnitude. L1 cache is ~200x faster than main memory. Memory is ~1,000x faster than SSD. SSD is ~100x faster than HDD. A cross-continent round trip is ~300x slower than a same-datacenter hop.

OperationLatency
L1 cache reference0.5 ns
L2 cache reference7 ns
Main memory reference100 ns
Send 1 KB across 1 Gbps network10 µs
SSD random read (4 KB)150 µs
Same-datacenter round trip500 µs
Read 1 MB sequentially from SSD1 ms
HDD seek10 ms
Read 1 MB sequentially from HDD20 ms
Cross-continent round trip (CA to EU)150 ms

Design implication: a value cached in Redis (sub-millisecond) replaces a cross-datacenter call (tens of milliseconds). That's a 100x latency win from one architectural decision.

Three Formulas That Do All Your Estimates

Your interviewer says "back of the envelope." What they mean is: do this math out loud while I decide whether your instincts are calibrated. No pressure.

Most capacity estimation problems collapse into three numbers: QPS, storage, and bandwidth. Knowing how to derive each from DAU is the whole skill.

QPS

Average QPS = (DAU × avg requests per user per day) / 86,400
Peak QPS ≈ average QPS × 2 to 5

Example: 50M DAU, 10 requests per user per day.

  • Average QPS = 500M / 86,400 ≈ 5,800
  • Peak QPS ≈ 12,000 to 29,000

Storage per year

Storage/year = QPS × payload size (bytes) × 31,536,000

Example: 5,800 QPS, 500-byte records.

  • 5,800 × 500 × 31,536,000 ≈ 91 TB/year

Bandwidth

Bandwidth (Gbps) = QPS × response size (bytes) × 8 / 1,000,000,000

Example: 5,800 QPS, 100 KB responses.

  • 5,800 × 100,000 × 8 / 1B ≈ 4.6 Gbps

One habit worth building: state your assumptions before calculating. Your interviewer will often adjust DAU or payload size. Show you know which number changes which knob. It also buys you ten seconds of thinking time, which is not a side effect.

The Data Sizes You Need Cold

KB ≈ 10^3 bytes (2^10)     MB ≈ 10^6 bytes (2^20)
GB ≈ 10^9 bytes (2^30)     TB ≈ 10^12 bytes (2^40)
PB ≈ 10^15 bytes (2^50)

Practical sizes worth loading into working memory:

  • One character: 1 byte
  • A tweet with metadata: ~500 bytes
  • A compressed image: 1 to 5 MB
  • One minute of HD video (compressed): ~100 to 200 MB
  • A typical database row: 100 to 500 bytes
  • A short URL record: ~200 to 500 bytes

What a Single Instance Can Actually Handle

These are rough single-instance maximums. They exist to tell you when you've outgrown a component, not to predict production performance.

ComponentApproximate Throughput
Redis100K to 500K ops/sec
PostgreSQL5K to 50K QPS (scales with query complexity)
MySQL5K to 30K QPS (faster for simple OLTP reads)
Kafka100K to 500K messages/sec per broker
Cassandra10K to 50K writes/sec per node, scales linearly

The number that changes your design is when a single component runs out of headroom. PostgreSQL at 10K write QPS is a signal to add read replicas or shard. Cassandra is the choice when you need writes to scale horizontally from the start.

SQL or NoSQL?

Default to SQL. Switch to NoSQL when a specific constraint forces the change, not before.

The NoSQL hype cycle has convinced a lot of engineers that Cassandra is the mature choice. It is not the mature choice for a URL shortener with 1,000 users. Reaching for distributed wide-column storage before you've saturated a single Postgres instance is the system design equivalent of buying a cargo ship for your commute.

Use SQL (PostgreSQL, MySQL) when:

  • You need ACID transactions (payments, inventory, reservations)
  • Your queries involve JOINs across related tables
  • Write volume stays under ~10K QPS on one instance
  • Strong consistency is required

Use NoSQL when:

  • You need horizontal write scaling beyond a single SQL primary
  • Your schema changes frequently or data is semi-structured
  • You're storing time series, graph relationships, or large blobs
  • Eventual consistency is acceptable

For a deeper comparison: SQL vs NoSQL for system design interviews.

The CAP theorem is vocabulary for explaining your choice, not a framework for making it. In practice, all distributed systems must tolerate network partitions, so the real decision is CP vs AP.

ClassificationDatabases
CP (consistent + partition-tolerant)MongoDB, Redis, Zookeeper
AP (available + partition-tolerant)Cassandra, CouchDB, DynamoDB

Full guide: CAP theorem explained.

Which Caching Pattern Should You Default To?

There are four. Cache-aside is the default: the app checks the cache first, falls back to the database on a miss, then writes the result into the cache. Good for read-heavy workloads with tolerable staleness. Write-through sends writes to cache and database simultaneously, keeping the cache warm but adding write latency. Write-back (write-behind) hits the cache immediately and updates the database asynchronously. Fast, but risks data loss on crash. Read-through sits the cache in front of the database and handles population automatically, simplifying app logic. The others appear when write latency or consistency is the specific problem you're solving. Detailed breakdown: caching strategies for system design.

Scale in This Order. Seriously, This Order.

Almost every system design interview follows the same arc. Knowing the scaling sequence means you never paint yourself into a corner mid-answer.

The temptation is to skip steps 1 through 7 and arrive immediately at "we have fifty microservices communicating over gRPC with service mesh and a distributed trace for each request." Resist it. Interviewers can smell premature complexity from across a video call.

1. Single server. One app, one database. Establish baseline QPS.

2. Load balancer plus multiple app servers. Stateless horizontal scaling. Removes the single point of failure.

3. Read replicas. Separate read traffic from the primary. Most read-heavy systems scale here before sharding becomes necessary.

4. Caching layer. Redis or Memcached in front of the database. Roughly 80% of read traffic is cacheable. Drops database load dramatically.

5. Database sharding. Horizontal partitioning when writes outpace a single primary. Shard key selection is the hard part. Consistent hashing handles node additions gracefully. Details: consistent hashing for system design interviews.

6. CDN for static assets. Images, video, and JavaScript served from the network edge. Cuts origin bandwidth and reduces latency for geographically distributed users.

7. Message queue for async processing. Kafka or a managed queue decouples producers from consumers and absorbs traffic spikes without dropping requests.

8. Extract services when they hit independent scaling limits. Separate the notification service. Separate the search index. Don't microservice everything from day one.

The scaling progression from single server to extracted services

Each step earns its complexity. Skip one and the interviewer will ask why.

Read-Heavy vs Write-Heavy: Two Different Problems

Read-heavy systems (social feeds, product catalogs, Wikipedia) want caching, read replicas, and CDNs. One write path, many read paths in front of it.

Write-heavy systems (logging, IoT telemetry, real-time analytics) want append-optimized databases, message queues, and stream processing. Cassandra and Kafka live here. Buffer writes, process in batches, serve reads from a derived store.

Most systems are 80 to 90% read-heavy, which is why caching is almost always the first optimization worth proposing. If you say "add Redis" in your first design pass, you're usually right, and your interviewer knows you know the shape of the problem.

Real-World Numbers to Calibrate Your Intuition

A few scale references that show up in estimates:

  • Twitter: ~300M DAU, ~500M tweets per day, ~6,000 tweets per second average
  • YouTube: ~2B monthly users, ~500 hours of video uploaded per minute
  • WhatsApp: ~100B messages per day, ~1.15M messages per second

You won't be tested on these exact figures, but knowing the order of magnitude prevents serious estimation errors. Arriving at "we need 600,000 servers" without pausing to sanity-check it is a yellow flag that you've lost the thread.

Quick Reference

  • L1 cache is ~200x faster than RAM. RAM is ~1,000x faster than SSD. Cross-continent is ~300x slower than same-datacenter.
  • Average QPS = (DAU × requests/day) / 86,400. Peak is 2 to 5x.
  • Storage/year = QPS × payload size (bytes) × 31.5M.
  • Redis handles ~100K to 500K ops/sec. PostgreSQL starts struggling at ~10K write QPS. Cassandra scales writes horizontally.
  • Default to SQL. Switch to NoSQL when writes outpace a single instance or schema flexibility is needed.
  • Scaling sequence: single server, load balancer, read replicas, cache, sharding, CDN, queue, services.
  • Read-heavy: cache and replicas. Write-heavy: queue and append-optimized storage.

The numbers only help if you can put them into a live conversation under pressure. That's a different skill from memorization and it takes practice. If you want to run actual walkthroughs where you say the estimates out loud and get rubric-based feedback in real time, SpaceComplexity runs voice-based system design mock interviews that score your reasoning, not just your diagrams.

Your interview is in thirty minutes. You have this.

Further Reading