Memoization vs Caching: Two Tools That Sound the Same, Work Differently

Every engineer has called their memo = {} dictionary a "cache" in a code review. The senior engineer across the table winced but said nothing, because correcting it would take longer than the meeting. Then they went home and wrote a Confluence page that nobody read.

This conflation causes real bugs: people skip cache invalidation in systems that desperately need it, or bolt eviction logic onto algorithms that never needed it. So let's untangle these two things once and for all.

A Cache Is a Box. Memoization Is a Rule.

Caching is a technique. Memoization is a specific application of that technique to pure functions.

A cache is any storage layer that holds a computed or fetched result so you don't have to repeat the work. Your CPU's L1 cache holds recently accessed memory bytes. A CDN caches HTTP responses. Redis caches database query results. All of them trade memory for speed.

Memoization takes that idea and narrows it to one context: a function that, given the same inputs, always returns the same output. You store the return value keyed by the input arguments. Next time the function is called with those same arguments, you return the stored value and skip the function body entirely.

def fib(n, memo={}):
    if n in memo:
        return memo[n]
    if n <= 1:
        return n
    memo[n] = fib(n - 1, memo) + fib(n - 2, memo)
    return memo[n]

That dictionary is a cache. The rule "store results keyed by input, return early on a hit" is memoization.

All memoization is caching. Not all caching is memoization.

Three Places the Distinction Matters

Scope

Memoization is always function-level. The cache key is the input. The cached value is the return value. Nothing else. The whole thing fits in a dictionary.

A general cache can sit at any layer: HTTP (Cache-Control headers), application (Redis, Memcached), database (query result cache), or hardware. It can cache response bodies, serialized objects, HTML fragments, anything your team can convince themselves doesn't need to be real-time.

Invalidation (Phil Karlton Was Not Talking About Your Dictionary)

This is where the conflation creates actual bugs.

Memoization applied to a pure function never needs invalidation. A pure function returns the same output for the same input every time. fib(10) is 55 today, tomorrow, and after a server restart. There is nothing to invalidate. The cache can grow for the lifetime of the program and never go stale.

A general cache is different. The data it holds reflects external state: a database row, a third-party API response, a user's session. That state changes. A product price fetched at 9am is wrong by noon.

This is why Phil Karlton said, "There are only two hard things in computer science: cache invalidation and naming things." He was not talking about your memo = {} dictionary. He was talking about the Redis cluster that your product manager just assured a customer will "always reflect the latest data."

General caches need invalidation strategies: TTL (time-to-live), event-driven invalidation on write, explicit purge endpoints, versioned cache keys. Entire engineering careers have been shaped by cache invalidation bugs. Stack Overflow threads about stale data go back to 2008 and are still getting new answers.

If you find yourself adding TTLs to a memoized recursive function, stop. Put the keyboard down. Either the function is not actually pure, or you're solving the wrong problem.

Eviction

Memoization tables grow until the program exits, or until you bound them manually. For a function with n distinct inputs, the table holds at most n entries. That's fine for finite state spaces: Fibonacci indices up to some limit, grid positions in an m x n grid.

General caches have finite capacity and need eviction policies: LRU, LFU, TTL-based expiry, size-based caps. For memoization in DP problems, you don't think about eviction at all. The cache fills, you read the answer, the program ends. Clean.

Where Memoization Changes the Math

General caching improves performance. Memoization can change the asymptotic complexity of an algorithm. That's a different category of improvement entirely.

Take naive recursive Fibonacci.

def fib(n):
    if n <= 1:
        return n
    return fib(n - 1) + fib(n - 2)

             fib(5)
           /        \
        fib(4)     fib(3)
        /    \     /    \
     fib(3) fib(2) fib(2) fib(1)
     /   \
  fib(2) fib(1)

Every node is a function call. fib(3) appears twice, fib(2) three times. The tree doubles roughly at every level. Time complexity: O(2^n). For n=40, that's over a billion calls. Your laptop's fan will have opinions.

Add memoization and fib(3) is computed once. Every subsequent call hits the dictionary and returns immediately. The tree collapses to a line.

fib(5) -> fib(4) -> fib(3) -> fib(2) -> fib(1)
                           -> fib(0)
          [cached hits for the right branches]

Time complexity: O(n). Space complexity: O(n) for the memo table, plus O(n) call stack depth.

A Redis cache in front of a slow database endpoint does not change the algorithmic complexity of anything. It reduces latency by avoiding repeated work at runtime. Valuable. But it's a constant factor improvement, not an order-of-magnitude change in what your code is even doing.

Memoization transforms exponential algorithms into polynomial ones. General caching makes polynomial algorithms faster in practice.

For more on how memoization computes exactly as many states as the problem has, and why the formula is states × work per state, see Memoization Time Complexity: One Formula for Every Top-Down DP.

The Same Logic, Two Dimensions

Unique paths in an m x n grid: how many ways can you move from top-left to bottom-right, moving only right or down?

Recursive without memoization: exponential. The same subgrid (r, c) gets recomputed from every ancestor cell.

def paths(r, c, rows, cols, memo={}):
    if (r, c) in memo:
        return memo[(r, c)]
    if r == rows - 1 or c == cols - 1:
        return 1
    memo[(r, c)] = (
        paths(r + 1, c, rows, cols, memo) +
        paths(r, c + 1, rows, cols, memo)
    )
    return memo[(r, c)]

Distinct states: m × n (every grid cell). Work per state: O(1). Total: O(m × n) time, O(m × n) space.

A caching layer at the application level would avoid re-running the computation across requests. Useful. But it doesn't touch the internal complexity. Memoization does.

When to Use a General Cache

Memoization doesn't apply when:

The function has side effects (reads from a database, writes to a file, calls an API). Same inputs can produce different outputs depending on external state. You can't cache it with a dictionary and pretend it's fine.
The result space is potentially unbounded and you need memory control. You need eviction logic, which memoization doesn't provide.
You're caching across process boundaries. Memoization is in-memory, in-process. If you have a fleet of servers, each one builds its own memo table from scratch. Redis solves this; per-process memoization does not.
Results have a natural expiry. Price data, inventory counts, user sessions. These need TTL-based invalidation. Memoization will happily serve you a product price from six hours ago.

A rule of thumb: if asking "has this changed?" is a meaningful question, you need a general cache. If the answer is always no by definition, memoization works.

The Tradeoffs Side by Side

	Memoization	General Cache
Scope	Single pure function	Any layer, any data
Invalidation needed	No	Yes
Eviction policy	Usually none	LRU, LFU, TTL, etc.
Across processes	No	Yes (Redis, Memcached)
Complexity impact	Can be transformative	Constant factor reduction
Memory growth	Bounded by state space	Requires explicit caps
Staleness risk	None (pure functions)	Always present

The Space Cost You Can't Ignore

Both techniques trade memory for speed.

Memoization's memory cost equals the number of distinct subproblems times the size of each stored value. For Fibonacci, that's O(n) integers. For a two-dimensional DP problem, that's O(m × n). For bitmask DP like the Traveling Salesman Problem, the memo table can be O(n × 2^n), which is still exponential, just better than O(n!) brute force. Progress.

If a memo table is too large, you sometimes switch to tabulation (bottom-up DP), which lets you discard rows you no longer need. The full comparison of top-down memoization and bottom-up tabulation is worth reading if you're hitting memory limits.

General caches have the opposite pressure: if the cache grows unbounded, you either add eviction logic or accept OOM errors at 3am. Redis configurations typically cap memory and define what to evict when the limit is hit, which is how the phrase "Redis eviction policy" became something engineers have opinions about at standup.

How to Decide Which One You Need

Ask three questions in order.

Is the function pure? Same inputs always produce the same output, no side effects. If yes, memoization is a candidate. If no, you need a general cache with invalidation logic.

Is the problem recursive with overlapping subproblems? If the same subproblem appears multiple times in a call tree, memoization will collapse that redundancy and may fundamentally change your time complexity. This is the dynamic programming framework signal: overlapping subproblems plus optimal substructure means DP applies.

Do results cross process boundaries or expire? If you need the cache to survive restarts, be shared across servers, or handle stale data, you need a general caching layer. In-process memoization won't help. You'll need Redis. Everyone ends up needing Redis.

Most interview problems with recursive solutions and overlapping subproblems want memoization. Most systems-level performance problems want a general cache. The two rarely overlap in the same solution.

Talking through this tradeoff clearly in an interview is a different skill than knowing the theory cold. SpaceComplexity runs voice-based mock interviews that test exactly this kind of reasoning under pressure, with rubric-based feedback on how well you communicate your choices.