Memoization Time Complexity: One Formula for Every Top-Down DP

- Memoization converts an exponential recursion tree into a DAG where each unique state runs exactly once.
- Memoization time complexity = distinct states × work per state; cache hits are O(1) and excluded from the per-state cost.
- Space complexity splits into memo table size (one entry per state) plus maximum call stack depth.
- Stack depth equals the longest sequential dependency chain, not the total number of unique states.
- Grid problems with m×n states typically have only O(m+n) stack depth, not O(m×n).
- Per-state loops multiply into total time: coin change costs O(amount × k), not O(amount), because each state iterates over k coin denominations.
You write a recursive solution. It's elegant. The recurrence is obvious. Then you clock the complexity and it's O(2ⁿ). You add a dictionary. Now it's O(n). That feels like a magic trick. It isn't. The transformation has a precise mechanical explanation, and it gives you a formula for memoization time complexity that you can apply to any top-down DP: time equals the number of distinct states multiplied by the work done per state.
Understand that formula and you can analyze any top-down DP from scratch, in an interview, without guessing.
The Tree That Recomputes Everything
Fibonacci is the clearest example because the redundancy is so visible. The recurrence is F(n) = F(n-1) + F(n-2), so each call spawns two children:
def fib(n): if n <= 1: return n return fib(n - 1) + fib(n - 2)
Call fib(5). It calls fib(4) and fib(3). fib(4) calls fib(3) and fib(2). That second fib(3) call (the one triggered by fib(4)) is identical to the first. It recomputes the same subtree. fib(2) shows up three times. fib(1) shows up five times.

Fifteen calls for n=5. The tree grows at O(φⁿ) where φ ≈ 1.618. For n=50, that's roughly 2.3 trillion calls. You will get the answer. You just will not be around to see it.
The number of nodes satisfies T(n) = T(n-1) + T(n-2) + 1, the same recurrence as Fibonacci itself. The growth rate is O(φⁿ) where φ ≈ 1.618.
What Caching Actually Changes
When you memoize, you store each result the first time it's computed and return the cached value on every subsequent call. The structural effect is what matters. Memoization turns the recursion tree into a directed acyclic graph (DAG).
In the tree, every node is a full recursive computation. In the DAG, every unique state is a single node, and duplicate calls become edges pointing to an already-computed node rather than new subtrees. fib(3) exists once in the DAG. Both callers that would have recursed into it instead do a hash lookup and get the answer in O(1).

You are not speeding up each computation. You are refusing to do the same work twice. Sounds obvious in hindsight.
The exponential tree does not get faster. It gets replaced by a much smaller structure. That replacement is what caching achieves.
The Memoization Time Complexity Formula
Once you see the DAG, the complexity follows directly. Each node is computed exactly once. The total work is the sum of work across all nodes. Factor that sum:
Total time = (number of distinct states) × (work per state)
"Work per state" means the work done inside a single call, excluding any recursive calls. Those recursive calls hit the cache and cost O(1) each. They do not count toward the per-state cost.
For Fibonacci: there are n+1 distinct states (F(0) through F(n)). Each state does one addition and two cache lookups. Work per state is O(1). Total: (n+1) × O(1) = O(n).
The O(φⁿ) tree collapsed to O(n) not because the algorithm became clever, but because each unique computation now runs exactly once. That's the whole mechanism. Write it down.
Tracing fib(5) Call by Call
def fib(n, memo=None): if memo is None: memo = {} if n in memo: return memo[n] if n <= 1: return n memo[n] = fib(n - 1, memo) + fib(n - 2, memo) return memo[n]
Trace fib(5) and count what actually executes:
fib(5): cache miss. Callsfib(4).fib(4): cache miss. Callsfib(3).fib(3): cache miss. Callsfib(2).fib(2): cache miss. Callsfib(1)(returns 1) andfib(0)(returns 0). Storesmemo[2] = 1.- Back in
fib(3): callsfib(1). Returns 1. Storesmemo[3] = 2. - Back in
fib(4): callsfib(2). Cache hit, returns 1. Storesmemo[4] = 3. - Back in
fib(5): callsfib(3). Cache hit, returns 2. Storesmemo[5] = 5.
Six unique states (fib(0) through fib(5)), all computed exactly once. The right branch of every call from fib(4) upward is already in the cache. The algorithm did not get smarter. It just started keeping notes.

Note: fib(0) and fib(1) return early via the base-case check and are never stored in memo. fib(1) actually runs twice (called from both fib(2) and fib(3)), but each run is O(1), so it doesn't change the asymptotic story.
What Happens With a Two-Parameter State
The formula extends naturally when your state has more than one parameter.
Coin Change: given denominations coins and a target amount, find the minimum number of coins to make amount. The recurrence: dp(r) = 1 + min(dp(r - c) for c in coins).
def coin_change(coins, amount): memo = {} def dp(remaining): if remaining in memo: return memo[remaining] if remaining == 0: return 0 if remaining < 0: return float('inf') memo[remaining] = 1 + min(dp(remaining - c) for c in coins) return memo[remaining] result = dp(amount) return -1 if result == float('inf') else result
Apply the formula:
- States: the
remainingparameter ranges from 0 toamount. That isamount + 1distinct values. - Work per state: for each state, we loop over all
k = len(coins)denominations. Each iteration does one subtraction and one cache lookup. Work per state is O(k).
Total time: O(amount × k). Without memoization, the recursion branches k ways at each level, giving O(kᵃᵐᵒᵘⁿᵗ) in the worst case. Memoization collapses that to a polynomial in the two natural dimensions of the problem.

A 2D state space follows the same logic. For a grid path problem with state (row, col), there are m × n states and O(1) work per state, giving O(m × n) total.
Space Has Two Separate Bills
Memoization adds space in two places, and they must be counted separately. Miss either one and your space analysis is wrong.
The memo table stores one entry per distinct state. Fibonacci: O(n). Coin Change: O(amount). A 2D grid: O(m × n). This part is straightforward.
The call stack holds the currently active frames. "Active" means called but not yet returned. With memoization, a cache hit returns immediately and does not deepen the stack. So the stack depth equals the longest chain of unique states that can be live simultaneously.
For Fibonacci, the longest chain runs from F(n) down through F(n-1), F(n-2), ..., F(0). That is n+1 frames deep. Stack depth is O(n).
Total space = O(memo table size) + O(max stack depth).
For Fibonacci: O(n) + O(n) = O(n). Same asymptotic result, two distinct contributions.
The Stack Depth Trap
Here is where complexity analyses go wrong. Stack depth is not the number of unique states. It is the longest sequential dependency chain.
Consider grid paths: how many ways can a robot move from (0, 0) to (m-1, n-1) moving only right or down? The state is (row, col). There are m × n states. But the deepest dependency chain from (m-1, n-1) back to (0, 0) has at most m + n - 2 steps. Stack depth is O(m + n), not O(m × n).

This distinction matters in practice. A problem with 10⁶ states but a dependency chain depth of 1,000 is fine. The same problem with a chain of depth 10⁶ will hit the default recursion limit and crash. Python caps the call stack at 1,000 frames by default. It will raise a RecursionError and walk away. That is when you either increase the limit or convert to bottom-up tabulation.
When Work Per State Is Not O(1)
The formula only gives the right answer if you correctly identify the per-state cost.
For Coin Change, each state runs a loop over k coin types. The work per state is O(k), not O(1). The total is O(amount × k). Calling it O(amount) misses the coin count entirely.
For longest common subsequence with strings of length m and n, each state (i, j) does one comparison and a max of two cache lookups. Work per state is O(1). Total: O(m × n).
The rule: exclude recursive calls from per-state cost. They are O(1) cache hits. Count everything else. If a state loops, that loop cost goes into the formula. If a state does a comparison or arithmetic, it's O(1) and vanishes into the constant.
What You Actually Changed
Memoization does one concrete thing: it prevents the same arguments from being processed twice. That converts the recursion tree (a structure with repeated subtrees) into a DAG (a structure where each node appears once). Once you have a DAG, the total work is just the sum of work across all nodes, and that sum factors cleanly.
The formula is:
Time = (distinct states) × (work per state, excluding recursive calls)
Space = (memo table size) + (maximum call stack depth)
If you want to go deeper on framing DP problems before you get to the complexity analysis, the dynamic programming framework post walks through how to identify states and transitions in the first place.
Explaining state counts and complexity derivations out loud is different from writing them in a notebook. SpaceComplexity runs voice-based mock interviews with structured feedback, so you can practice the way you would in an actual interview.
Recap:
- Naive recursion with overlapping subproblems builds an exponential tree full of repeated subtrees.
- Memoization replaces that tree with a DAG where each unique state is computed exactly once.
- Time = distinct states × work per state (cache hits are O(1) and excluded from per-state cost).
- Space splits into the memo table (one entry per state) and the call stack (deepest sequential dependency chain).
- Stack depth is not the number of states. It is the longest chain. For grid problems with m×n states, the chain is often only O(m+n) deep.
- When a state involves a loop over k items, that k multiplies into the total time. Do not miss it.