Design a CDN: The System Design Interview Walkthrough

A user in Bucharest requests a 4K video. The origin server is in Oregon. The speed of light gives you roughly 100ms one-way, 200ms round-trip. Every time the player needs the next segment, that's a 200ms toll before a single byte moves. The video would be unwatchable.

CDNs exist to fight physics. You can't move the data center, so you move the data. The core idea is simple: cache content as close to the user as possible, in hundreds of locations worldwide, so most requests never touch the origin at all. The speed of light does not care about your SLA.

This is one of the most common CDN system design interview questions in senior-level loops. Here is a full walkthrough: requirements, architecture, data model, tradeoffs, and how to pace yourself across 45 minutes.

Start With Clarification

Before touching architecture, you need to narrow the problem. A CDN for a static asset host looks nothing like a CDN for live video.

Ask these:

What kind of content? Static assets (JS, CSS, images) vs video vs dynamic API responses. Each has very different cacheability.
Pull or push? Does the CDN fetch content on first request, or do you pre-populate edge nodes?
What is the invalidation requirement? Can content be stale for minutes? Or must changes propagate in seconds?
Who are the users geographically? A global consumer product needs PoPs on six continents. A US-only SaaS needs far fewer.
What scale? Daily active users, peak QPS, average object size.

For this walkthrough, scope it as: a global pull CDN for a streaming platform, serving video segments and static assets, with a target cache hit ratio above 95% and a purge API for content updates. Peak load: 500K requests per second globally.

The Math Behind the Hit Rate

At 500K RPS with a 95% cache hit rate, the origin sees 25K RPS. If the hit rate drops to 80%, the origin sees 100K RPS, four times the load. Hit rate isn't a vanity metric.

Average video segment: 2MB. Static asset: 50KB. Weighted average request: ~500KB. At 500K RPS, that's roughly 250 GB/s of total egress. Most of that flows from edge nodes, not origin.

Hot content (the top 1% of titles) accounts for roughly 80% of traffic. Cold content (long-tail catalog) is served far less frequently and may not survive LRU eviction between requests.

Three Tiers. No More, No Less.

Three tiers. Every CDN of real scale uses them. Sketch two in your interview and you'll spend the rest of it explaining why you left out the important one.

User → Edge PoP → Mid-Tier (Shield) → Origin

Edge PoPs (Points of Presence) are small data centers near population centers, internet exchange points, or inside ISP networks. Cloudflare operates 300+. Akamai operates servers inside 130+ countries. Netflix's Open Connect embeds appliances directly inside ISP facilities, achieving 95% of global traffic without touching the public internet at all.

An edge node handles the request if the content is cached there. Cache hit: serve and done. Cache miss: ask the mid-tier.

The mid-tier (shield) is the key insight most candidates skip. Without it, a cache miss at any of 200 global edge nodes independently fetches from origin. A single uncached asset generates 200 simultaneous origin requests, one per PoP. With a mid-tier, all 200 edges route misses to one regional shield node, which makes a single request to origin and caches the response for every edge in that region. This collapses origin fan-out from O(PoPs) to O(regions), typically 3 to 5x.

Cloudflare calls this Tiered Cache. Fastly calls it shielding. The concept is identical: one designated node per region absorbs all origin traffic for that region. It sounds obvious once you hear it. Most candidates skip it entirely.

The origin sits behind the shield tier. It only sees misses that penetrate all cache layers. At 95% hit rate, it handles 5% of total traffic. At good architectural health, the origin runs comfortably within its capacity even during traffic spikes.

Three-tier CDN architecture: user requests fan out to edge PoPs, cache misses coalesce at the mid-tier shield, and the shield makes a single request to origin per region Edge nodes serve cache hits. Misses go to the shield. The shield goes to origin once. Everyone else waits.

Request Routing: How Do Users Find the Nearest Edge?

Two approaches. Most large CDNs use a hybrid.

BGP Anycast: The CDN advertises the same IP address from every PoP simultaneously. BGP, the internet's routing protocol, naturally routes each user to the topologically nearest PoP. No DNS, no application logic. Failover is sub-second when a PoP goes down, because BGP reconverges. The tradeoff: you give up deterministic control. BGP routes on network topology, not geographic distance, so "nearest" by BGP hop count isn't always nearest by latency.

DNS geo-routing: The CDN returns different IP addresses from its authoritative DNS based on the resolver's location. More control. You can implement logic like "EU users go to Frankfurt; if Frankfurt is degraded, fail over to Amsterdam, never to US-East." The tradeoff: DNS TTLs create lag. If you set a 60-second TTL and a PoP fails, it takes up to 60 seconds for clients to re-resolve to a healthy node.

Most large CDNs use Anycast for their authoritative DNS servers and DNS geo-routing (GSLB) for content traffic. Anycast makes the DNS layer fast and resilient; GSLB gives you control over which PoP a user actually lands on.

BGP Anycast vs DNS geo-routing: Anycast advertises the same IP from all PoPs and relies on BGP topology, while DNS geo-routing returns different IPs per region with TTL-controlled failover Most large CDNs run Anycast for their DNS tier and GSLB for content traffic.

The Cache Layer: Where Hit Rates Are Won and Lost

Cache key construction is where most candidates lose points. Say "just use the URL" and watch your interviewer's face. The cache key determines when two requests get the same cached response.

A naive key is just the URL. That breaks immediately:

?utm_source=google&id=123 and ?id=123&utm_source=google are the same resource but different strings
A gzip-compressed response and a brotli response for the same URL are different objects
Mobile and desktop might receive different content for the same URL

A correct cache key is: scheme + host + normalized_path + sorted_filtered_query_params + content_negotiation_headers

Cache key construction: a URL is decomposed into scheme, host, path, filtered and sorted query params, and Vary headers; tracking params like utm_source and session are stripped before assembling the final key Tracking params that don't affect content get stripped. Same resource, same cache key.

Normalization means sorting query params alphabetically and stripping analytics tokens (utm_*, fbclid, etc.) that don't affect content. Stripping junk query params is one of the highest-impact optimizations for cache hit rate. A marketing campaign that appends ?utm_campaign=spring_sale to every URL will bypass the cache entirely without normalization. Your hit rate craters. Your origin bill doubles. Marketing sends a Slack message asking why the site is slow.

The Vary header on responses tells the CDN which request headers to include in the key. Vary: Accept-Encoding means the CDN stores separate objects for gzip vs brotli. Vary: Accept-Language means one object per language. Vary on too many headers and you fragment the cache. Vary on too few and you serve the wrong variant.

Pull vs Push

Pull CDN: Edge fetches from origin on first miss, caches the response. Subsequent requests hit cache. Simple to operate. The first user after a deploy gets a cold miss and slow response. For long-tail content that's rarely requested, the cache stays cold and misses keep hitting origin.

Push CDN: You pre-populate edge nodes with content before any user requests it. Netflix's Open Connect uses this: during off-peak hours, Netflix pushes popular titles to ISP-embedded appliances, so the first user request is always a cache hit. Good for large files with predictable demand. Bad for unpredictable traffic or when your catalog is too large to pre-populate everywhere.

Most general-purpose CDNs default to pull. Netflix-scale streaming with a known catalog uses push. If you are not Netflix, you probably want pull.

TTL and Freshness

Cache-Control response headers tell the CDN how long to keep content:

Cache-Control: s-maxage=86400, max-age=3600

s-maxage targets shared caches (CDNs). max-age targets browsers. s-maxage wins when both are present. An 86400 second TTL means the CDN holds the object for 24 hours. If the origin updates the file, users get stale content until TTL expires.

The core tension: longer TTL = higher hit rate = lower origin load = faster delivery. Shorter TTL = fresher content = more origin requests = more cost.

Static assets with hash-based filenames (app.a1b2c3.js) can use long TTLs because the filename changes on each deploy. HTML files that reference those assets need short TTLs so browsers pick up the new filenames. The pattern: long TTL + filename fingerprinting, not short TTL everywhere. It sounds obvious. Interview candidates still suggest 30-second TTLs on everything. Don't.

Cache Invalidation

Two mechanisms:

URL purge: Delete specific cached objects by URL. Simple and surgical. Works well when you know exactly what changed. API: DELETE /cache?url=https://example.com/assets/logo.png. Most CDN providers rate-limit purge requests to 500 per minute.

Surrogate keys (cache tags): Attach string tags to responses via the Surrogate-Key or Cache-Tag response header. Surrogate-Key: product-123 category-shoes. When product 123 changes, a single purge request for tag product-123 invalidates every cached response carrying that tag, regardless of URL. An e-commerce CDN can attach the product ID to every asset related to that product, then purge all of them with one API call on a product update.

Surrogate key invalidation: a single PURGE request for tag product-123 looks up the tag index and marks every associated cache entry stale across all URLs One API call, many invalidations. The tag index does the fanout.

What Lives on Each Edge Node

Each edge node maintains a local cache store. The conceptual schema for a cache entry:

CacheEntry = {
    "key": str,               # normalized cache key
    "body": bytes,            # response body
    "status": int,            # HTTP status code
    "headers": dict,          # response headers to replay
    "cached_at": timestamp,
    "expires_at": timestamp,  # cached_at + s-maxage
    "tags": list[str],        # surrogate keys for tag-based purge
    "size_bytes": int,
}

The tag-to-key index lives alongside the cache store:

TagIndex = {
    "product-123": {"key1", "key2", "key3"},
    "category-shoes": {"key1", "key4"},
}

Purging a tag = look up the tag in the index, mark all associated keys as stale, remove them from the index. The next request for any of those keys triggers an origin fetch and re-caches.

Eviction is LRU or LFU. LRU evicts the least recently accessed item when the store is full. LFU evicts the least frequently accessed. Most CDNs use approximate LRU (similar to Redis's approach: sample N random entries, evict the oldest among the sample). Exact LRU at scale requires a doubly linked list that becomes a locking bottleneck.

TLS at the Edge

The CDN terminates TLS at the edge node. The browser's TLS handshake completes against a server 10-50ms away, not 200ms away. TLS termination at the edge reduces connection setup time by 40-60% compared to carrying the handshake to origin. The CDN then communicates with origin over a persistent HTTPS connection (or HTTP internally in trusted network segments). Certificate management is centralized at the CDN provider.

The Tradeoffs Worth Arguing About

Decision	Option A	Option B	Pick when
Content population	Pull (lazy fetch)	Push (pre-populate)	Pull for general purpose; Push for large predictable catalog
Request routing	BGP Anycast	DNS geo-routing	Anycast for sub-second failover; DNS for fine routing control
Tiering	Flat (edge → origin)	Tiered (edge → shield → origin)	Always tier at scale; flat only for tiny deployments
TTL length	Short (seconds)	Long (hours/days)	Long TTL + filename fingerprinting beats short TTL
Invalidation	URL purge	Surrogate keys	Surrogate keys when one event invalidates many URLs

The Thundering Herd Problem (Your Future 3am Wake-Up Call)

A popular asset expires simultaneously on all edge nodes. Every node misses and requests origin at the same time. Origin sees a sudden spike proportional to the number of edge nodes, not user count. This is a great way to cause an incident. Most engineers learn about request coalescing right after causing one.

Two mitigations:

Request coalescing: When multiple concurrent requests for the same uncached key arrive at an edge node, only one request goes upstream. The rest wait for the first response and all receive it. The edge serializes the origin fetch.

Staggered expiration: Add a small random jitter to TTLs. Instead of all copies of logo.png expiring at exactly T+86400, they expire between T+86000 and T+87000. The cache fill pattern spreads across time.

Thundering herd: without coalescing, all 6 expired edge nodes hit origin simultaneously; with coalescing, only one request reaches origin while the rest wait and share the response Coalescing serializes the origin fetch. One request goes upstream; everyone waiting gets the same response.

Where Things Break at Scale

Origin egress cost: Every cache miss costs a round-trip to origin plus egress bandwidth fees. Each percentage point of hit rate improvement at 500K RPS eliminates 5K RPS of origin load. Hit rate is the primary cost lever.

Hot content at a single PoP: A viral video that's only cached in one PoP because no user in other regions has requested it yet. Solve with proactive replication: monitor rising request rates and push popular content to additional PoPs before it goes viral. Netflix identifies "trending" content and pre-positions it before peak hours.

Cache key fragmentation: A URL with 40 different query parameter combinations generates 40 separate cache entries for the same underlying content. Strip tracking params from cache keys. Use No-Vary-Search where supported to declare certain params irrelevant to content identity.

Shield node as SPOF: The shield tier is a single node per region. If it fails, every edge in the region falls through to origin. Mitigation: run two shield nodes per region in active-passive or active-active configuration.

CDN System Design Interview: The 45-Minute Clock

Time	Focus
0-5 min	Clarifying questions: content type, pull vs push, invalidation latency, geography, scale
5-12 min	High-level three-tier architecture: edge PoPs, mid-tier shield, origin
12-22 min	Cache mechanics: cache key design, TTL strategy, pull vs push, coalescing
22-32 min	Routing (Anycast vs DNS), invalidation (URL purge vs surrogate keys), TLS at edge
32-42 min	Bottlenecks and tradeoffs: hit rate math, thundering herd, hot content replication
42-45 min	Wrap-up, unsolved problems, interviewer questions

The most common mistake: spending 20 minutes on the basic two-tier diagram and leaving no time for cache key design and invalidation. Those are the hard parts. Get to them. Interviewers have seen the two-tier diagram a hundred times. Cache key construction is where the conversation gets interesting.

CDN design and distributed systems interviews reward candidates who can articulate why each component exists, not just that it exists. Explaining that the shield tier exists to eliminate O(PoPs) origin fan-out is the kind of reasoning that fills an interviewer's write-up. If you want to practice explaining that reasoning live, under time pressure, with an interviewer probing your tradeoffs in real time, SpaceComplexity runs voice-based mock system design interviews with rubric feedback on exactly this.

What to Take Into the Room

A CDN fights physics by caching content near users across hundreds of PoPs.
Three tiers: edge PoPs, mid-tier shield, origin. The shield collapses origin fan-out.
Routing uses BGP Anycast (sub-second failover, less control) or DNS geo-routing (more control, TTL lag).
Cache key = scheme + host + normalized path + sorted filtered query params + Vary headers. Normalization is the highest-leverage hit rate optimization.
Pull CDN is the default; push CDN suits large catalogs with predictable demand.
Long TTLs plus filename fingerprinting beat short TTLs everywhere.
Invalidation: URL purge for targeted updates; surrogate keys for bulk invalidation.
Request coalescing and staggered TTL expiration prevent thundering herd.
TLS terminates at the edge: 40-60% faster connection setup.