Cache Hit vs Cache Miss: The 100x Speed Gap Behind Every Lookup

- Cache hit vs cache miss: a hit returns data immediately from the fast store; a miss falls through to a slower store (RAM, disk, or network) at 100x to 1,000,000x the cost
- Latency numbers: L1 hit ~1 ns; DRAM after a cache miss ~80–100 ns; network round-trip ~1,000,000 ns — the gap is not academic
- Three cache miss types: compulsory (first access), capacity (eviction when full), conflict (address collision) — compulsory misses are what memoization models
- Cache hit rate formula =
hits / (hits + misses); a 99% hit rate can still be dominated by the 1% of misses if each miss costs 100x more - Memoization connection: each unique subproblem is one cache miss; total unique subproblems equals the number of misses, which equals the O(...) complexity bound
- LRU cache design forces you to solve all three cache decisions at once: detect hit vs miss, return on miss, evict on full
- Arrays beat linked lists in practice because contiguous memory maximizes cache hits; pointer chasing in linked lists causes a miss on nearly every node
Your code asks for some data. One of two things happens: it's right there, ready in nanoseconds, and your program breezes through. Or it isn't, and your program goes on a field trip to find it, burning thousands of nanoseconds along the way while the CPU sits there staring at the ceiling.
That's a cache hit versus a cache miss. It sounds almost too simple to matter. It's the reason two algorithms with identical Big-O complexity can run at completely different speeds in practice, and it's the reason profiling a "slow" function often reveals that the CPU isn't doing any work at all. It's just waiting.
The Cache: A Tiny, Fast Store Next to a Big, Slow One
A cache is a small, high-speed storage layer positioned between your code and a larger, slower data source. The idea appears everywhere in computing:
- CPU cache sits between the processor and RAM, holding recently used memory.
- Memoization cache sits between a recursive function and its recomputed results.
- Redis / Memcached sits between your application and a database.
- CDN sits between a user and your origin server.
Every cache bets on the same thing: if you accessed something recently, you'll probably need it again soon. That's temporal locality, the observation that access patterns in real programs are not random, and recently touched data is overwhelmingly likely to be touched again.
What Is a Cache Hit?
A cache hit is when the data you asked for is already there. You request it, the cache checks, finds it, and hands it back immediately. No round-trip, no waiting, no drama.
In code, this looks like a memoization lookup that finds its key:
memo = {} def fib(n): if n in memo: # cache hit: return immediately return memo[n] if n <= 1: return n memo[n] = fib(n - 1) + fib(n - 2) return memo[n]
Every call to fib(n) after the first one is a hit. The cache hit is why memoization collapses O(2^n) recursion into O(n): the duplicated subproblems don't recompute, they just read a stored answer.
At the CPU level, a cache hit means the processor found the memory address in L1 or L2 cache and got the value without touching RAM. The whole operation takes about one to five nanoseconds. Your CPU barely had to think.
What Is a Cache Miss?
A cache miss is when the data isn't there. The system has to fall back to the slower store, fetch the value, possibly save it for next time, and then finally return it to you.
The first call to fib(n) for any new value of n is a miss. In CPU terms, a miss means sitting around waiting for a value to arrive from RAM. Or, in the worst cases, from disk or a network call.
There are three kinds of cache misses worth knowing by name:
| Type | Also Called | When It Happens |
|---|---|---|
| Compulsory | Cold miss | First time you ever access this address |
| Capacity | Cache is full; older data got evicted | |
| Conflict | Multiple addresses map to the same cache slot |
Cold misses are the ones you're modeling in most interview contexts. The first time you compute a subproblem, it's a miss. Every time after that, it's a hit. The ratio of hits to misses is what determines whether your cache is actually helping or just hogging memory.
How Wide Is the Gap, Really?
Wide. Very wide. Embarrassingly wide.

The latency difference between an L1 hit and a network miss spans six orders of magnitude. Here's the full picture:
| Storage Level | Approximate Latency | Rough Capacity |
|---|---|---|
| CPU L1 cache | ~1 ns | 32 KB |
| CPU L2 cache | ~5 ns | 256 KB |
| CPU L3 cache | ~15 to 40 ns | 4 to 32 MB |
| Main memory (DRAM) | ~80 to 100 ns | Gigabytes |
| SSD | ~100,000 ns (0.1 ms) | Terabytes |
| Network round-trip | ~1,000,000 ns (1 ms) | varies |
A single DRAM miss costs roughly 100x more than an L1 hit. A database miss over the network costs a million times more. Not 100% more. A million times more. When you profile a slow program and the bottleneck isn't obvious, this is usually what's happening: the CPU is done, it's just waiting for data to show up.
This is also why arrays beat linked lists in practice, even though both traversals are O(n). An array is contiguous in memory, after the CPU loads one element, the next one is probably already in the same 64-byte cache line (16 int32 values fit in one line). A linked list scatters nodes across memory. Every ->next pointer dereference is a potential cache miss. See Array vs Linked List Performance: Same Big-O, Completely Different Speed for how that plays out in benchmarks, or What Is Memory Locality? for the hardware side in detail.
Cache Hit Rate: The Number That Summarizes Everything
Cache hit rate is the fraction of lookups that find their data in the cache:
hit rate = hits / (hits + misses)
A 99% hit rate sounds great. But one miss might cost 100x more than a hit, so that one miss dominates total runtime if you're not careful. A 90% hit rate against a database with a 1 ms round-trip means 10% of your requests are waiting 1,000 microseconds each, while the other 90% finish in under one. The math is not flattering.
For a cache to be worth the memory it occupies, the hit rate needs to be high enough that average access cost comes down meaningfully. A Redis cache for a hot database table should hit 95%+. A memoization table for Fibonacci reaches nearly 100% after warmup because there are exactly n unique subproblems and you compute each one exactly once.
A Concrete Example: The LRU Cache Problem
LeetCode 146 is the interview problem that forces you to design a cache from scratch. Fixed capacity. Evict the least-recently-used item when it's full. The interviewer will assign it to you and then watch what you reach for first.
from collections import OrderedDict class LRUCache: def __init__(self, capacity: int): self.capacity = capacity self.cache = OrderedDict() def get(self, key: int) -> int: if key not in self.cache: # cache miss return -1 self.cache.move_to_end(key) # mark as recently used return self.cache[key] # cache hit def put(self, key: int, value: int) -> None: if key in self.cache: self.cache.move_to_end(key) self.cache[key] = value if len(self.cache) > self.capacity: self.cache.popitem(last=False) # evict LRU item
get returns -1 on a miss and the cached value on a hit. put handles eviction: when the cache is full, the item that hasn't been touched for the longest time gets the boot.
The reason this problem shows up constantly is that it forces you to think through all three decisions every cache design involves:
- How do you detect a hit vs a miss? (hash map lookup)
- What do you return on a miss? (sentinel value or fallback logic)
- What do you evict when the cache is full? (the eviction policy)
Get all three right under time pressure and you're in good shape. For a full walkthrough of the underlying data structure, see LRU Cache Implementation. For how LRU compares to LFU eviction, see LRU vs LFU Cache.
Why Coding Interviews Care About This
Cache hit and miss reasoning shows up in three places.
Memoization time complexity. When you add a memo table to a recursive function, you're saying: every unique input is a miss on first call (do the work), and a hit on every repeat call (return early). The number of unique subproblems is the number of misses, and that's what determines the O(...) bound. Fibonacci: O(n). 2D DP grid: O(m * n). The formula is always: time = states * work per state, where "states" equals total misses. Memoization Time Complexity walks through the full derivation.
System design questions. When a design prompt involves a database, a CDN, or a high-throughput API, caching is almost always the right answer for read-heavy traffic. You're expected to know what hit rate makes a cache worth it, how to handle stale data (TTL, write-through, write-around), and which eviction policy fits the access pattern. What Is a Cache Eviction Policy? covers that last piece.
Cache-friendly algorithm choices. At senior levels, interviewers sometimes probe memory access patterns directly. Choosing a flat array over a linked structure, or iterating a matrix row-by-row instead of column-by-column, can mean the difference between a solution that runs within time limits and one that mysteriously times out on large inputs. This surfaces most often in C++, Go, and Rust interviews where allocator behavior is part of the discussion.
If you want to practice reasoning through tradeoffs like these out loud under real time pressure, SpaceComplexity runs voice-based DSA mock interviews with rubric-based feedback across exactly these dimensions: problem solving, communication, and technical depth.
The Space-Speed Tradeoff
Every cache is a bet: spend more memory now, pay less time later. That tradeoff is explicit in memoization, you store O(n) or O(m*n) results to avoid exponential recomputation, and implicit in hardware, where L1 cache is fast precisely because it's tiny, and RAM is slower precisely because there's so much of it.
The break-even is simple:
- If the data will be requested many times, cache it. The hit rate pays off the memory cost.
- If the data will only be requested once, caching wastes memory without saving any time.
This is why session data for active users lives in Redis, and why historical logs go straight to cold storage. Most interview questions implicitly assume a high-hit-rate workload. If the workload has low hit rates, the cache isn't the answer, it's the problem.