Memoization Time Complexity Formula: From O(2^n) to O(n) in One Step
- Tree-to-DAG transformation is why memoization works: every unique subproblem is solved exactly once instead of being recomputed
- Time = unique states × work per state — exclude recursive sub-calls from per-state work because cache hits are O(1)
- Space has two independent components: memo table size (one slot per unique state) and call stack depth (longest path in the call DAG)
- Stack depth is not the state space size: Grid Paths has O(m×n) memo but only O(m+n) stack depth — interviewers push on this distinction
- Memoization only pays off when subproblems overlap — generating all permutations has no overlap, so adding a cache wastes memory with zero speedup
Add a cache and O(2^n) becomes O(n). That is not magic. It is one transformation: your call tree collapses into a DAG, and every unique subproblem gets solved exactly once.
Once you see that, the memoization time complexity formula is just arithmetic: unique states times work per state. The rest is knowing which number goes where. This post derives it for three problems so you can reconstruct the answer on the fly, not recite a memorized number.
The Problem With Naive Recursion
Take Fibonacci:
def fib(n): if n <= 1: return n return fib(n - 1) + fib(n - 2)
Draw the call tree for fib(5). fib(3) appears twice. fib(2) appears three times. fib(1) appears five times. Your computer is doing the same homework over and over and there is nobody stopping it.
The recurrence T(n) = T(n-1) + T(n-2) + O(1) solves to O(2^n). At n=40 that is roughly a trillion operations. At n=50, you are not waiting until the program finishes. You are waiting until retirement.
The cause is not recursion itself. The same subproblem, fib(k), gets solved repeatedly because nothing persists the answer. This is the overlapping subproblems property, and it is exactly what memoization fixes.
What Memoization Actually Does to the Call Tree
Add a cache:
def fib(n, memo={}): if n in memo: return memo[n] if n <= 1: return n memo[n] = fib(n - 1) + fib(n - 2) return memo[n]
Trace fib(5) again. The left branch computes fib(4), then fib(3), then fib(2), then fib(1) and fib(0). Every result stored. When the right branch tries fib(3), it hits the cache and returns in O(1). fib(2), fib(1). All cache hits.
The exponential tree has collapsed into a directed acyclic graph (DAG). Each unique node is visited exactly once.

Left: 15 calls to compute fib(5). Right: 5. Same algorithm, one dictionary.
Tree becomes DAG. Duplicate subtrees become cache hits. That is the whole trick. Everything else in this post is applying the same logic to messier state spaces.
The Memoization Time Complexity Formula
Four steps, every time:
- Find the parameters that uniquely determine the return value. These are your state.
- Count the unique combinations of those parameters. That is your state space.
- Count the work done inside one call, after the cache lookup, excluding recursive sub-calls. Those sub-calls are O(1) cache hits now.
- Multiply:
Time = unique states × work per state.
For Fibonacci: one parameter n, values 0 through n, so n+1 unique states. Inside each call: one addition. O(1) work per state. Total: O(n) time.
Space has two separate components. The memo table holds one entry per unique state: O(n). The call stack holds one frame per level of the deepest call chain. The deepest chain is fib(n) -> fib(n-1) -> ... -> fib(0), n levels deep. Stack depth is O(n). Total space: O(n) memo + O(n) stack = O(n).
Coin Change: When the State Space Gets Interesting
Given coin denominations and a target amount W, find the minimum number of coins.
def coin_change(coins, amount, memo={}): if amount in memo: return memo[amount] if amount == 0: return 0 if amount < 0: return float('inf') best = float('inf') for coin in coins: result = coin_change(coins, amount - coin, memo) best = min(best, result + 1) memo[amount] = best return best
Without memoization, this branches k ways at each level and the depth reaches W. Worst case: O(k^W). That is not "somewhat slow." That is "your laptop fans spin up and nothing happens."
Apply the formula. The state is one parameter: amount, ranging 0 to W. So W+1 unique states. Inside each call you loop through k coins, O(1) each. That is O(k) work per state. Total time: O(W * k).
Space: memo table has one slot per amount, O(W). The deepest call chain is when you subtract 1 each time: amount -> amount-1 -> ... -> 0, W levels deep. Stack depth is O(W). Total space: O(W).
Grid Paths: Where Stack Depth Trips People Up
Count the unique paths through an m by n grid moving only right or down.
def unique_paths(m, n, memo={}): if (m, n) in memo: return memo[(m, n)] if m == 1 or n == 1: return 1 memo[(m, n)] = unique_paths(m - 1, n, memo) + unique_paths(m, n - 1, memo) return memo[(m, n)]
Without memoization: every cell branches two ways, the path is m+n-2 steps, and total calls balloon toward O(2^(m+n)).
With memoization: the state is the pair (row, col). Rows 1 to m, columns 1 to n. So mn unique states. Inside each call: one addition. **Total time: O(mn).**
Here is where interviews go wrong, reliably and entertainingly. The memo table holds one entry per cell: O(mn). But call stack depth is not mn. Every recursive call decrements the row or the column. The deepest chain is m+n-2 steps, all the way down then all the way left. Stack depth is O(m+n), even though the memo table is O(m*n).
Total space: O(mn) memo + O(m+n) stack. Memo dominates, so O(mn) is fine as a headline answer. But if an interviewer pushes you to separate the two components, you want to have this ready. They are measuring different things.
The Mistakes Interviewers Actually See
Counting recursive sub-calls as "work per state" is the most common error. It sounds like this: "Each state does O(k) work because it makes k recursive calls." Wrong. Those calls return from cache in O(1). You are double-counting. Work per state is the computation after the lookup, before returning. Recursive calls are free at analysis time.
Saying space is O(states) and stopping there. For grid paths, "O(mn) space" is correct but incomplete. Memo table is O(mn) and call stack is O(m+n). Both matter. Memo table = number of unique states. Stack depth = length of the deepest path in the call DAG. Do not conflate them.
Applying memoization where subproblems do not overlap. If every recursive call produces a unique subproblem, you added a hash table for zero gain. The classic case: generating all permutations. Each call produces a genuinely new state. Adding a cache wastes memory and slows things down. Memoization only pays off when the overlapping subproblems condition holds.
A Clean Method for Any Memoized Recursion
Five questions, any problem, on the spot:
- What are the state parameters? Every argument that affects the return value.
- How many unique states exist? Multiply the ranges of each parameter.
- What work happens inside one call? Loops, string ops, anything that is not a recursive call.
- What is the deepest call chain? The path with the most recursive steps before a base case.
- Combine:
Time = unique states × work per state.Space = memo table size + stack depth.
This is the same method from recursion time complexity analysis, applied after the tree-to-DAG transformation. For call stack accounting, see recursion space complexity.
For why overlapping subproblems and optimal substructure are the two conditions DP requires, see dynamic programming framework.
Practice This Under Interview Conditions
Knowing the formula is step one. Applying it clearly while someone is watching you is a different skill. SpaceComplexity runs realistic mock interviews where you explain complexity analysis out loud, not just write it down. You get feedback on whether your explanation was clear, not just whether the number was right. Interviewers score communication separately from correctness. That gap is real.
For how memoization fits into the broader picture of bottom-up DP, memoization vs tabulation covers the tradeoffs.
Further Reading
- Dynamic programming (Wikipedia) - foundation and formal definition
- Overlapping subproblems (Wikipedia) - the condition that makes memoization worthwhile
- Optimal substructure (Wikipedia) - the second condition required for DP correctness
- Introduction to Algorithms, CLRS Chapter 15 - rigorous treatment of memoization and DP
- Memoization on GeeksforGeeks - worked examples with complexity tables