Distributed Cache System Design: A Complete Interview Walkthrough

May 27, 202612 min read
interview-prepcareerdsaalgorithms
Distributed Cache System Design: A Complete Interview Walkthrough
TL;DR
  • Clarify before drawing: confirm read/write ratio, staleness tolerance, region count, and data size before touching the whiteboard.
  • Consistent hashing moves only K/(N+1) keys on topology changes; virtual nodes fix load skew; Redis Cluster uses 16,384 fixed hash slots.
  • Cache-aside is the right default write policy; write-through trades memory for consistency; write-behind trades durability for write speed.
  • Approximated LRU with random sampling is how Redis handles eviction in production; pair every key with a TTL regardless of policy.
  • Cache stampede is solved with leases (only the first requester fetches) or a distributed mutex; hot keys need replication or suffix sharding.

The interviewer writes "design a distributed cache" on the whiteboard and steps back. Most engineers draw a box, label it Redis, and start talking about LRU eviction. They run out of things to say by minute twelve.

The distributed cache system design interview is not asking you to describe Redis. It's asking you to make a series of decisions under ambiguity, explain the tradeoffs in each one, and demonstrate that you know which decisions to make first.

A crowd cheering "the database is slow, let's just use cache for everything" while one engineer stares with quiet dread The energy in the room when someone proposes caching as the answer to everything. One of you knows what comes next.


Clarify First. Design Second.

"Design a distributed cache" leaves a dozen questions unanswered. Ask them before you draw anything.

  • What are we caching? Social graph edges, session tokens, and product catalog pages have very different consistency requirements.
  • What's the read-to-write ratio? A 100:1 ratio opens very different design choices than 10:1.
  • How stale can data be? A five-second-old user profile is usually fine. A five-second-old account balance is not.
  • Single region or multi-region? One data center simplifies the consistency story enormously. Multi-region is a different problem.
  • Do keys need TTLs? Session tokens expire. Product catalog items may not.

Propose numbers and get them confirmed. Something like: 500K reads/second, 50K writes/second, P99 latency under 5ms, 10TB of hot data, eventual consistency acceptable.

You cannot design a correct system without knowing what correct means. Interviewers who watch you skip this step write it down. Not in a good way.


The Three-Layer Sketch

Every distributed cache has three components:

  1. A client library that routes requests to the right cache node.
  2. A cache cluster of nodes holding data in memory.
  3. A backing datastore (PostgreSQL, S3, DynamoDB) that the cache lives in front of.

Three-layer cache architecture showing cache hit path (client to cache, green response) and cache miss path (client to cache to database, with populate arrow back up) Left: a cache hit. Right: a miss that fetches from the datastore and populates the cache before returning. The backing store doesn't know the difference.

The client library does more work than it looks like. It maintains the cluster topology, maps keys to nodes, handles retry logic, and detects node failures. Facebook's Memcache system extracted this into a dedicated routing component called mcrouter so application code stays oblivious to cluster topology. For smaller systems, embedding routing logic in the SDK is fine.

Sketch all three layers and the request flow before diving into any one component. Interviewers want to see that you understand the whole system before you optimize a part of it. Jumping straight to consistent hashing without drawing the architecture is like explaining tire pressure without mentioning the car.


Consistent Hashing Stops You From Crashing Your Own Database

How do you split keys across N cache nodes?

The naive answer is hash(key) % N. It works until you add a node. When N becomes N+1, almost every key maps to a different node. Your cache hit rate collapses to near zero, and you have effectively DoSed your own database with cache misses.

Consistent hashing, introduced by Karger et al. at MIT in 1997 specifically for distributed web caching, solved this. Place both nodes and keys on a ring from 0 to 2^32. A key belongs to the nearest node clockwise from its position on the ring. Adding a node only displaces the keys between it and its ring predecessor. About 1/(N+1) of all keys move, which is the theoretical minimum. With modular hashing, (N-1)/N keys move.

import hashlib import bisect class ConsistentHashRing: def __init__(self, nodes, replicas=150): self.ring = {} self.sorted_keys = [] for node in nodes: for i in range(replicas): h = self._hash(f"{node}:{i}") self.ring[h] = node self.sorted_keys.append(h) self.sorted_keys.sort() def _hash(self, value): return int(hashlib.md5(value.encode()).hexdigest(), 16) def get_node(self, key): if not self.ring: return None h = self._hash(key) idx = bisect.bisect(self.sorted_keys, h) % len(self.sorted_keys) return self.ring[self.sorted_keys[idx]]

The replicas=150 parameter creates 150 virtual node positions per physical node. Without virtual nodes, three physical nodes placed randomly on the ring might cluster together and leave one node handling 70% of traffic. Virtual nodes smooth the load distribution and let you weight more powerful machines by giving them more ring positions.

Consistent hash ring with three nodes (A, B, C) each having multiple virtual positions scattered around the ring. A key labeled hash("user:42") lands at a position and walks clockwise to Node B. Right panel compares mod-N hashing (67% remapped) vs consistent hashing (25% remapped) when adding a node. Each colored dot is a virtual slot. The key walks clockwise and lands on Node B. Add a fourth node and only Node B's immediate predecessor range moves. With mod-N hashing, two-thirds of everything moves.

Redis Cluster takes a different approach. It uses exactly 16,384 fixed hash slots, computed as CRC16(key) & 0x3FFF. That number was chosen deliberately: its slot bitmap fits in exactly 2KB, small enough to include in the periodic heartbeat messages nodes exchange without bloating the network. Each node owns a contiguous range of slots, and the ownership map is gossiped cluster-wide. The underlying goal is the same as consistent hashing: minimal key movement when the cluster topology changes. See Hash Map Time Complexity for the underlying structure that makes O(1) slot lookups work.


Replication: What Happens When a Node Dies

A cache with no replicas is a liability. When a node dies, every key it owned is a miss until the backing datastore re-serves it. Suddenly your database is doing all the work it thought the cache was doing. This is how you find out your database was sized assuming the cache existed.

The standard pattern is primary-replica replication. Each shard has one primary that accepts writes and one or two replicas that serve reads. Redis Cluster does exactly this. Replication is asynchronous by default: the write returns as soon as the primary acknowledges it, and replicas catch up shortly after.

Async replication means you will lose very recent writes if a primary dies before they replicate. For a cache, this is almost always acceptable. The application re-fetches and re-populates the key from the database. The data is not lost, only the cached copy.

A replication factor of 3 (one primary, two replicas) is the standard default. It survives one node failure with no availability impact on reads and no data loss risk.

Synchronous replication roughly doubles write latency because the primary waits for a replica to confirm before returning. Worth it for session tokens. Not worth it for product thumbnail URLs.


Three Write Policies. One Right Answer for Most Systems.

How data gets into the cache is the question that separates engineers who have read about caching from engineers who have thought about it.

Three write policy flows side by side: cache-aside (miss fetches from DB and populates), write-through (every write goes to cache and DB simultaneously), write-behind (write to cache immediately, flush to DB async) Cache-aside on the left. Write-through in the middle. Write-behind on the right. Each arrow tells you what you are paying for.

Cache-aside (lazy loading). The application checks the cache on every read. On a miss, it fetches from the database, writes the result to the cache, and returns it. On a write, it updates the database and invalidates the cached key. The cache only holds data that someone has actually requested.

This is the right default for read-heavy systems. Cold starts cause a spike of misses, but the cache warms quickly. The downside: the first access after a miss or invalidation is always slow. You will notice this every time you deploy.

Write-through. Every write goes to both the database and the cache atomically. The cache is always warm. The cost is that you pay cache memory for every write even if no one ever reads that key. Write-through is good when read-after-write consistency matters and you can afford the memory overhead.

Write-behind (write-back). Writes go to the cache immediately and flush to the database asynchronously in batches. Write latency is minimal and batching reduces database pressure. The risk: if a cache node dies before the async flush, those writes are permanently gone. Write-behind is a specialized tool, not a default.

State which you are picking and why. Cache-aside with TTL-based invalidation is the right default for most systems. Deviate only when you have a specific reason, and say the reason out loud.


LRU Eviction: The Default That Actually Works

When memory is full, something has to go. See LRU Cache implementation for the full mechanics. In a distributed cache, the eviction policy is set per node, and every node runs it independently.

LRU (Least Recently Used) is the right starting point. Redis does not implement a true LRU because tracking exact access timestamps for every key is expensive. Instead, it samples five random keys (configurable via maxmemory-samples), picks the one with the oldest access time, and evicts it. It also maintains an eviction pool of 16 candidates across rounds, so accuracy compounds without a full scan. Good enough for almost every production system.

Raise maxmemory-samples to 10 if you want better accuracy at a small CPU cost. Lower it to 3 for extremely high-throughput nodes where the CPU overhead of sampling matters.

LFU (Least Frequently Used) is better when a small set of keys is permanently hot. A key accessed once yesterday should not survive over a key accessed ten thousand times. Redis 4.0 added LFU as a configurable option. For most use cases, LRU is fine.

TTL is orthogonal to eviction. Every key can carry an expiration. Redis handles it with lazy expiration (check on access, delete if expired) plus a background sweep that samples random expired keys periodically. The combination keeps memory clean without a full scan over all keys. Set TTLs on everything by default, even if you think the data never goes stale. You will be wrong about that.


Two Problems That Separate Good Answers From Strong Hire

If you have covered the above clearly, your interviewer will push on failure modes. Two come up in nearly every cache interview.

Cache stampede (thundering herd). A popular key expires. Thousands of concurrent requests miss simultaneously and all query the database at once. The database collapses under load it was never sized for.

Facebook's 2013 NSDI paper on Scaling Memcache describes the fix they shipped to production: leases. When a key misses, the cache grants the first requesting client a lease token and tells all other clients to wait 100ms and retry. Only the lease holder fetches from the database and populates the key. The lease is automatically invalidated if a concurrent write invalidates the key before the fetch completes. One database query instead of thousands.

A simpler approach for smaller systems: use a distributed lock or a local mutex. The first request acquires it, fetches and populates the key, and releases it. Subsequent requests wait briefly and receive the cached result.

Hot keys. Consistent hashing assigns all traffic for a given key to one node. If that key is requested 200K times per second, that node is the bottleneck regardless of how large the rest of the cluster is.

Left panel shows one node overwhelmed at 99% CPU with all traffic arrows pointing to it while two other nodes sit completely idle. Right panel shows the same traffic distributed across three nodes via sharded keys product:123:0 through product:123:9. Consistent hashing guarantees one key, one node. Forever. The cluster is only as healthy as its hottest key.

Two fixes: replicate hot keys explicitly to multiple nodes and load-balance reads across all copies, or shard the key by appending a random suffix (product:123:0 through product:123:9) and merge at read time.

Both problems share the same root cause: the assumption that load distributes evenly. It never does in production. Flag this in the interview even if you are not asked about it. Showing you anticipate failure modes without being prompted is exactly what makes the write-up read differently afterward.


Use the Clock, Don't Fight It

TimeFocus
0-5 minClarifying questions, confirm target numbers
5-10 minThree-layer architecture sketch
10-20 minConsistent hashing, virtual nodes, Redis Cluster slots
20-30 minReplication, write policies, eviction and TTL
30-40 minFailure modes: stampede, hot keys
40-45 minCapacity math, tradeoff summary

Do not spend more than ten minutes on any one component without checking that your interviewer wants depth there. The tradeoff discussion at the end is what moves a candidate from "good" to "strong hire."

The interviewer already knows the material. They are watching whether you can reason through it aloud, make decisions confidently, and explicitly flag what you chose not to optimize. "I am going with cache-aside because it is the safest default and we can tighten consistency later if requirements change" is stronger than a perfect write-through implementation delivered in silence.

That narration under pressure is a real skill. It requires practice in the actual condition: talking while thinking, out loud, in real time. SpaceComplexity runs voice-based system design mocks with rubric feedback on both the technical decisions and how clearly you communicated them.


Distributed Cache System Design: Quick Recap

  • Clarify before designing. Read/write ratio, staleness tolerance, single vs. multi-region, data size. None of this is optional.
  • Consistent hashing moves O(K/N) keys when the cluster changes. Virtual nodes fix load imbalance. Redis Cluster uses 16,384 fixed hash slots as a pragmatic alternative.
  • Primary-replica replication, async by default, replication factor 3. Add sync replication only for correctness-critical data.
  • Cache-aside is the right default write policy. Write-through trades memory for consistency. Write-behind trades durability for write speed.
  • Approximated LRU with maxmemory-samples 5 is the Redis default. LFU for stable hot-key patterns. Always use TTLs.
  • Cache stampede: leases or a mutex. Hot keys: replicate or shard. State your reasoning out loud.

Further Reading