Caching Strategies for System Design: Know When to Use Each One

Caching strategies come up in almost every system design interview, and they all hit the same wall. Traffic is growing, the database is sweating, and the interviewer raises an eyebrow. You say "add a cache" and feel briefly heroic.

Then they ask: "Which caching strategy would you use?"

If your answer is "Redis," you just answered a different question. Redis is the tool. Cache-aside, write-through, and write-back are the strategies. Confusing the two is one of the most common ways a system design answer stalls out at the senior level, right after the interviewer puts down their pen.

Thinking you solved caching, then getting hit by invalidation, then thundering herd Adding a cache: act one, problem solved. Act two, problem introduced.

Every Strategy Answers Three Questions

Before picking a strategy, nail down three things:

Read path: when data isn't in the cache, where does it come from and who's responsible?
Write path: when data changes, does the cache update immediately or later?
Failure mode: if the cache goes down, what breaks?

The answers eliminate two of the three strategies for any given system.

Cache-Aside: Start Here Unless You Have a Reason Not To

Cache-aside is the correct default for read-heavy systems. The application code owns the cache logic: check the cache, handle the miss, populate the cache. Nothing is automatic.

READ PATH

App ──► Cache GET ──► HIT ──► return data
              │
             MISS
              │
              ▼
        DB SELECT ──► Cache SET ──► return data

WRITE PATH

App ──► DB WRITE ──► Cache DELETE (or expire) ──► ack

On a read, you ask the cache first. Miss means a round trip to the database, then you store the result. On a write, you update the database and invalidate the cache entry. The cache refills naturally on the next read.

Why it's the default: only the data that actually gets requested ends up in the cache. If 10% of your user records account for 90% of reads (typical Zipf distribution), cache-aside holds only that 10%. Write-through would have loaded everything, burning memory on records nobody reads.

The miss penalty is real: three round trips (cache miss, DB read, cache write) instead of one cache hit. On cold start or after a large invalidation event, your database sees a spike. That's the thundering herd problem. Every request that hits an empty cache runs straight to your database at the same time, simultaneously, all at once, as a group. You can soften it with staggered TTLs and probabilistic early expiry, but it doesn't disappear.

The other risk is stale reads. If you delete the cache entry on write, you're fine. If you rely entirely on TTL and your write hits the DB without touching the cache, readers see old data until the TTL expires. Explicit invalidation is more work but closes that window.

Use cache-aside when: reads dominate, not every record needs to be warm, and you can tolerate the occasional miss penalty.

Write-Through: Pay the Write Tax, Get Fresh Reads

Write-through flips the contract. Every write goes to the cache and the database synchronously before the application gets an acknowledgment. The cache is always up to date.

READ PATH (same as cache-aside)

App ──► Cache GET ──► HIT ──► return data
              │
             MISS
              │
              ▼
        DB SELECT ──► Cache SET ──► return data

WRITE PATH

App ──► Cache WRITE ──► DB WRITE ──► ack
         (both must succeed)

Write-through buys read-after-write consistency. A user updates their order, then immediately refreshes the page. Cache-aside might show them the old data. Write-through never will.

The cost is doubled write latency. Every mutation now requires two synchronous writes. On a write-heavy workload, that compounds fast. An analytics pipeline writing millions of event records has no business warming the cache on each write. Write-through there wastes memory and adds latency for a cache nobody reads.

One failure mode candidates miss: if the cache write succeeds but the DB write fails, you have stale data in the cache pointing to a state that was never persisted. Most teams write DB first, then cache, and accept a small window where the cache is stale if the cache write fails. Monitoring cache write errors matters more than most people realize.

Use write-through when: read-after-write consistency is required and your workload isn't dominated by writes. Financial records, inventory counts, user profiles.

Write-Back: Fastest Writes, Real Risk

Write-back (also called write-behind) takes the opposite bet. Writes go to the cache immediately and are flushed to the database asynchronously. The application gets an acknowledgment the moment the cache write succeeds.

WRITE PATH

App ──► Cache WRITE ──► ack (immediate)
                │
                ▼ (async, batched)
           DB WRITE

READ PATH

App ──► Cache GET ──► HIT ──► return data (may be ahead of DB)
              │
             MISS
              │
              ▼
        DB SELECT ──► Cache SET ──► return data

Write-back is the right choice when write throughput is the bottleneck and you can survive losing recent writes. If you're collecting 50,000 metric data points per second, the database can't absorb that synchronously. Write-back lets you buffer in Redis, coalesce multiple updates to the same key, and flush in efficient batches.

What you're giving up is durability. If the cache node crashes before the async flush, those writes are gone. For metrics and logs, losing the last five seconds of data is acceptable. For an order being placed, it is not.

Deploying write-back on your payment service's transaction log When the interviewer asks if write-back is appropriate for a fintech system.

The implementation is genuinely complex: background flush workers, retry logic, handling extended DB downtime, and careful ordering to prevent out-of-sequence updates. Most teams reach for Redis AOF persistence or a message queue rather than building the flush mechanism from scratch.

Use write-back when: write throughput is the primary constraint, eventual consistency is acceptable, and the system can tolerate some data loss on cache failure.

Write-Around: The Forgotten Fourth

Write-around bypasses the cache entirely on writes. Data goes straight to the database. Reads still check the cache first and load from the database on a miss.

This makes sense when you're writing data you don't expect to read soon. Bulk data imports, audit logs, historical records queried once a year. Loading them into the cache on write would evict hot data for records that sit cold for weeks.

Write-around is less a standalone strategy and more a modifier: cache-aside reads, but don't populate the cache on write.

All Four Strategies, Side by Side

Strategy	Read latency	Write latency	Consistency	Data loss risk	Best for
Cache-aside	Fast (hit) / slow (miss)	Low	Eventual	None	Read-heavy, general use
Write-through	Fast	High	Strong	None	Read-after-write critical
Write-back	Fast	Very low	Eventual	Yes	Write-heavy, high throughput
Write-around	Slow on first read	Low	Eventual	None	Write-once, rarely read

How to Talk About This in a System Design Interview

The common failure is waiting to be asked about caching. A stronger move is to raise it yourself when you identify the bottleneck.

"Reads are going to dominate here, and most of the load concentrates on a small set of popular records. I'd add Redis in front of the database using a cache-aside pattern, with a TTL of 60 seconds and explicit invalidation on writes. That handles the hot-path reads without caching every record."

That one sentence tells the interviewer you know the strategy, understand why it fits, and have thought about staleness. If they push back ("what if we need immediate consistency?"), you pivot: "then write-through, which doubles write latency but guarantees the cache is always current. The tradeoff is worth it for something like order status."

Senior answers name the strategy, name the tradeoff, and name which alternative they're rejecting. Confidently ruling out three options signals more depth than listing all four with equal enthusiasm.

Where People Get This Wrong

Saying "add Redis" and moving on. Redis is infrastructure. The strategy is what matters. Specify it or the interviewer will specify a follow-up question you won't enjoy.

Forgetting cache invalidation. "We'll cache it" is half an answer. How does stale data get evicted? Explicit delete on write, TTL, or both? Phil Karlton called cache invalidation one of the two hard problems in computer science in 1996. He was right, and the intervening thirty years have not improved things.

Applying write-back to anything financial. The data loss risk is real. Propose write-back for a payment system's transaction log and that's a red flag the interviewer will write down, underline, and bring up at the debrief.

Caching everything. Data that changes every second, or data accessed once a day, wastes memory and complicates DB interaction. Be selective. Not everything deserves a seat in your cache.

Ignoring cold start. If your cache is empty and traffic spikes, every request hits the database. Cache warming and circuit breakers matter in production and they matter in the interview. The interviewer asking "what happens when you first deploy?" is not a trap. It's an invitation to show you've thought past the happy path.

Knowing the theory gets you halfway. The harder part is explaining your reasoning out loud, under pressure, when the interviewer pushes back on your choice. That's a skill you build through repetition, not reading. SpaceComplexity runs voice-based system design mock interviews with rubric-based feedback on exactly this kind of trade-off discussion.

For related system design patterns, see how distributed caches work under the hood, the key-value store design walkthrough, and the full system design interview framework.