CDN and Edge Caching: What Every System Design Interview Tests

Most candidates can say "add a CDN" in a system design interview. Genuinely, "add a CDN" is the system design equivalent of "drink water and get more sleep" in medical advice: technically correct, completely unactionable. When the interviewer follows up with "how does CDN caching work end-to-end" or "what's your invalidation strategy," those same candidates tend to go very quiet very fast.

That gap is what this covers.

Your Origin Server Shouldn't Handle Every Request

A CDN (Content Delivery Network) is a globally distributed network of servers that serves content from a location physically close to the user instead of routing every request back to your origin. The latency payoff is real: a request from Tokyo to a San Francisco origin might take 150ms round-trip. A request from Tokyo to a Tokyo edge node takes 2ms.

The secondary benefit matters just as much: CDNs absorb traffic before it reaches your origin. A popular YouTube video might get 10 million requests per hour. If those hit your origin, you need to scale for 10 million concurrent reads. If 99% are served from edge caches, you need to scale for the remaining 100,000. Most of those are cache misses on content that just refreshed.

That's the difference between "buy more servers" and "buy roughly one server."

Major providers include Cloudflare, Akamai, AWS CloudFront, Fastly, and Google Cloud CDN.

How Your Request Finds the Nearest Edge

When a user hits images.yourapp.com, DNS resolves that hostname to a CDN IP address. Modern CDNs use Anycast: the same IP address is announced from hundreds of locations simultaneously. BGP, the protocol that routes internet traffic, automatically selects the shortest path to one of those locations. The user lands on the nearest Point of Presence (PoP) without knowing it.

That PoP is an edge server. It holds a local cache. If the requested object is in cache and not expired, the edge returns it immediately. If not, the request escalates.

The user never picked the edge node. The internet's routing infrastructure picked it for them. Which is, frankly, more competent than most manually configured load balancers.

Three Tiers Stand Between the User and Your Origin

CDNs are not just two hops. There are usually three.

User → Edge PoP (metro-level) → Origin Shield (regional) → Origin

The origin shield is the mechanism that makes origin offload real. Without it, every edge node that misses would make its own trip to origin. Thirty edge nodes miss at the same time, that's thirty simultaneous origin requests for the same file. With an origin shield, they all forward to one regional shield instead. The shield fetches from origin once, caches the response, fans it back out.

[User, Tokyo]
      |
      v
[Edge PoP, Tokyo]          <-- cache hit? return immediately
      |
      v (on miss)
[Origin Shield, Osaka]     <-- cache hit? return and populate edges
      |
      v (on miss)
[Origin, US-West]          <-- single fetch, propagates back up the chain

A well-tuned three-tier setup keeps fewer than 1% of total edge requests reaching origin for popular content. Origin shield is worth naming explicitly in an interview. It turns "CDN offloads traffic" from a vague hand-wave into a concrete architectural fact that an interviewer can actually write down.

Pull vs Push CDN: One Question Before You Choose

Pull CDN (lazy loading): The edge cache starts empty. When the first user in a region requests an asset, the CDN fetches it from origin, stores it, and serves it. Every subsequent user in that region gets the cached version until it expires. This is the default for most web applications.

Push CDN: You upload content to edge nodes proactively before any user requests it. No cache misses, ever. But you pay to store everything everywhere regardless of demand. Including that 800MB installer file that three people in Luxembourg will download once a year.

Property	Pull CDN	Push CDN
Cold start	First user in each region pays the miss	Pre-populated, zero misses
Storage cost	Only caches what users actually request	You store everything, everywhere
Best for	Dynamic traffic, unpredictable demand	Known large assets (installers, video files)
Control	Automatic via Cache-Control headers	Manual upload and management

Push CDNs make sense when the asset is large, the audience is global, and you need guaranteed zero-miss delivery from launch. Think a major software release or a film premiere on a streaming platform. For everything else, pull is simpler, cheaper, and doesn't require you to remember to push updates when content changes.

Cache Only What Every User Gets the Same Way

Getting this wrong is the most common CDN mistake in interviews, and in production.

Cache these at the edge:

Static assets: images, CSS, JavaScript, fonts
Video and audio streams
Public HTML pages that are identical for every user
Public API responses that rarely change

Do not cache these at the edge:

Anything user-specific: account pages, feeds, dashboards
Anything requiring authentication to authorize
Payment and checkout flows
High-write data where constant invalidation would negate any benefit

A common interview question: "Can you cache API responses in a CDN?" Yes, if the response is the same for any user who asks. A public product catalog, a list of trending posts, a live sports score with a 5-second TTL. The rule is: if the response contains user-specific data, it belongs behind origin, not behind an edge cache.

Caching a logged-in user's dashboard at the CDN layer and serving it to the next person who visits is a spectacular privacy bug, not a performance win. Do not do this. Interviewers notice.

Cache Invalidation: The One Hard Problem

Phil Karlton famously said there are two hard things in computer science: cache invalidation and naming things. He was not joking. Cache invalidation is the one that actually bites you in production, because naming things mostly just embarrasses you in code review.

You have three practical strategies.

TTL (Time-to-Live). Set Cache-Control: max-age=3600 and the CDN serves the cached version for one hour before re-fetching. Simple, cheap, slightly stale by design. The tradeoff: if you push a bug fix, users see the old broken version until the TTL expires. Great for content that changes predictably. Terrible for "we just discovered a security vulnerability in this JavaScript file."

Cache Purge via API. All major CDNs expose an API to invalidate specific URLs or patterns immediately. If you update your homepage, call cdn.purge("/"). More precise than TTL, but every deploy needs to trigger the right purge calls. Tag-based invalidation extends this: Cloudflare and Fastly let you tag cached responses so you can purge everything tagged product:42 when that product changes, no matter how many URLs it appears on. That is genuinely powerful when you have one entity appearing across dozens of pages.

Content Fingerprinting (Cache Busting). For JavaScript and CSS, embed a hash of the file contents in the filename: app.9a7f3.css. The filename changes on every build. The CDN never needs to be explicitly invalidated because the old URL and the new URL are different cache keys entirely. Set Cache-Control: max-age=31536000, immutable and the file is cached for a year. Your HTML just points to the new filename.

This is what every serious frontend build pipeline does, and it is the right answer for static assets in an interview.

There is also stale-while-revalidate (RFC 5861), which serves the cached version instantly while fetching a fresh copy in the background. Slightly stale, always fast. The CDN equivalent of answering from memory while quietly Googling under the table.

Don't Say CDN Until You Mean It

"We'll add a CDN for performance" is a sentence that sounds like an answer and contains zero information. Interviewers have heard it so many times it has become auditory wallpaper.

Do not volunteer "add a CDN" five minutes into every problem as a reflexive performance. The right moment is when you are discussing how content reaches users across regions, how to handle traffic spikes on read-heavy paths, or how to reduce latency for a global audience.

The natural structure:

Identify what content in your design is static or publicly cacheable
State you would serve it via CDN with appropriate TTLs
Briefly explain cache-miss escalation through the hierarchy
Address your invalidation strategy when content changes

For a video platform, CDN is the entire delivery layer, not a footnote. For an e-commerce site, product images and CSS go through CDN; checkout goes through origin. For a chat app, CDN is irrelevant for messages but relevant for media attachments.

The mistake is mentioning CDN without connecting it to a real bottleneck. "We'll add a CDN for performance" is wallpaper. "Static assets are served from CDN with a 24-hour TTL, and we use content fingerprinting so JS and CSS updates never require manual invalidation" is a complete thought an interviewer can write down.

One tells the interviewer you know the word. The other tells them you understand the system.

Name These Tradeoffs Before the Interviewer Asks

Every design choice has costs. Naming them yourself signals maturity. Waiting until the interviewer asks signals that you only know the happy path.

Stale data. CDN caches are out of date by design. For truly real-time data, CDN is the wrong layer. Know your freshness requirements before choosing a TTL.

Cost. CDN egress is not free. At scale it is a significant line item. Model your expected cache hit ratio. Low-traffic content on a pull CDN wastes money on misses. Rarely-accessed content on a push CDN wastes storage. Neither tradeoff is obvious until someone sees the bill.

Invalidation complexity. Purging one URL is simple. Purging interdependent content (a product that appears on 40 category pages), across multiple CDN providers, with guaranteed consistency across all edge nodes simultaneously, is a hard operational problem. It is not solved for free.

Cold starts in new regions. The first user in any PoP always pays the miss. For global launches or sudden traffic spikes in unexpected regions, you may need to pre-warm caches by crawling your own URLs through the CDN before users arrive. This is a real operational runbook item at companies that care about launch performance.

What This Looks Like in a Real Design

For a YouTube-style platform, CDN is not an afterthought. It is a multi-tier delivery network where video chunks are pre-positioned at edge nodes before users request them, adaptive bitrate streaming adjusts quality per-segment, and popular content effectively never touches origin after the first day. The CDN is the system at that scale.

Explaining all of that out loud, while someone is asking follow-up questions in real time, is a different skill than understanding it on paper. SpaceComplexity runs voice-based mock system design interviews with rubric-based feedback on how clearly you explain architectural choices like these. If CDN is a concept you know but struggle to articulate under pressure, that is exactly the gap mock interviews close.

For a deeper look at how a CDN is itself designed as a system design problem, see the CDN system design walkthrough. For distributed caching behind your origin servers, see the distributed cache system design guide. CDN and origin caching answer different questions and are best combined.