Top-Down vs Bottom-Up Dynamic Programming: Both Correct. One Runs Out of Stack.

You write the memoized recursive solution to a dynamic programming problem. It passes. Someone tells you bottom-up is "the right way," so you convert it. That also passes. You conclude the choice is cosmetic.

Then you get a knapsack problem where n = 1000 and W = 100000. The bottom-up table needs 100 million cells. The top-down version visits maybe 3 million of them, because most weight targets are unreachable with your actual input distribution. The choice just decided whether your solution uses 400 MB or 12 MB.

Both approaches traverse the same subproblem graph, just in opposite directions. That direction determines what you can compress, what can crash, and which constant factor you pay.

The Same Graph, Two Directions

Every DP problem has an underlying directed acyclic graph (DAG) of subproblems. An edge from A to B means "A depends on B." The root is the original problem. The leaves are the base cases.

Top-down starts at the root and recurses toward the leaves, caching each result on the way back up. This is memoization. It is lazy: a node is only visited when something depends on it.

Bottom-up starts at the leaves and fills toward the root, computing every node in topological order. This is tabulation. It is eager: every reachable node gets computed whether or not the root actually needs it.

The graph is the same. The traversal direction is not.

Two-panel diagram: left shows a recursive call tree with some nodes dimmed (top-down memoization, lazy); right shows a linear array filled left-to-right (bottom-up tabulation, eager)

Left: top-down recurses lazily, skipping nodes already cached. Right: bottom-up fills every cell in order, no call stack required.

Two Approaches, One Problem

Coin change makes the comparison concrete. Given coins = [1, 5, 10, 25] and amount = 41, find the minimum number of coins.

Top-down: start at 41, recurse into every sub-target, cache the result.

from functools import cache

def coin_change(coins: list[int], amount: int) -> int:
    @cache
    def dp(remaining: int) -> int:
        if remaining == 0:
            return 0
        return min(
            (1 + dp(remaining - c) for c in coins if c <= remaining),
            default=float('inf'),
        )

    result = dp(amount)
    return result if result < float('inf') else -1

The call dp(41) branches into dp(40), dp(36), dp(31), dp(16). Each recurses further. @cache intercepts any repeated call and returns the stored answer immediately.

Bottom-up: fill dp[0] through dp[41] in order, using previously solved sub-targets.

def coin_change(coins: list[int], amount: int) -> int:
    dp = [float('inf')] * (amount + 1)
    dp[0] = 0
    for i in range(1, amount + 1):
        for c in coins:
            if c <= i:
                dp[i] = min(dp[i], 1 + dp[i - c])
    return dp[amount] if dp[amount] < float('inf') else -1

By the time we need dp[41], every entry from 0 to 40 is already solved. No recursion, no stack.

Same answer. Same O(amount × coins) time. The bottom-up version uses a plain array instead of a hash map, pays no function-call overhead, and accesses memory sequentially. For small inputs these differences are noise. For large ones, they compound.

What the Complexity Numbers Actually Say

The time formula is identical for both: O(distinct states × work per state). CLRS states that the two approaches have the same asymptotic running time "except in unusual circumstances," and those circumstances are exactly the cases where one wins decisively.

	Top-down	Bottom-up
Time	O(states × work)	O(states × work)
Space (naive)	O(states) memo + O(depth) call stack	O(states) table
Space (compressed)	O(states), cannot reduce further	O(look-back window) via rolling array
Constant factor	Higher (hash map + function calls)	Lower (array + loops)
Sparse state spaces	Only visits needed states	Fills all cells regardless
Stack overflow risk	Yes, at ~1000 frames in Python	None

The constant factor gap is real. Python function-call overhead is roughly 1 microsecond per frame. A problem with 200,000 recursive states spends about 0.2 seconds on overhead alone, before computing anything. Bottom-up pays none of that.

When Top-Down Is the Right Choice

Top-down wins when only a fraction of the state space is reachable from the actual input.

Take the knapsack example above: with n = 1000 items and W = 100000, the bottom-up table allocates 100 million cells and fills every one. If item weights average 200 and most target capacities are unreachable, top-down might visit 5 percent of those cells. Both are O(nW) in the worst case. On the specific input you have, top-down is faster.

Interval DP is the other natural home for top-down. Problems like matrix chain multiplication, burst balloons, and optimal BST construction decompose an interval [i, j] by trying each split point k:

@cache
def dp(i: int, j: int) -> int:
    if i == j:
        return base_case(i)
    return min(
        dp(i, k) + dp(k + 1, j) + cost(i, j, k)
        for k in range(i, j)
    )

The recursion naturally handles ordering. You do not have to figure out what the topological order of the subproblems is. Top-down gets it for free from the call graph.

When Bottom-Up Is the Right Choice

Bottom-up wins when the problem needs all states and you want to compress space.

The key technique is the rolling array. Many DP tables have a regular dependency structure: row i depends only on row i-1. You do not need to keep all n rows. Keep one.

The 0/1 knapsack is the textbook example. Naively it is O(n × W) space:

# O(n × W) space
def knapsack_full(weights: list[int], values: list[int], W: int) -> int:
    n = len(weights)
    dp = [[0] * (W + 1) for _ in range(n + 1)]
    for i in range(1, n + 1):
        for w in range(W + 1):
            dp[i][w] = dp[i - 1][w]
            if weights[i - 1] <= w:
                dp[i][w] = max(dp[i][w], values[i - 1] + dp[i - 1][w - weights[i - 1]])
    return dp[n][W]

Row i only reads from row i-1. Compress to a single row:

# O(W) space
def knapsack_compressed(weights: list[int], values: list[int], W: int) -> int:
    dp = [0] * (W + 1)
    for weight, value in zip(weights, values):
        for cap in range(W, weight - 1, -1):  # backward scan is non-negotiable
            dp[cap] = max(dp[cap], value + dp[cap - weight])
    return dp[W]

The backward scan is the correctness guarantee, not a style choice. When you process capacity cap downward, dp[cap - weight] still reflects the state before the current item was considered, because you have not updated that index yet in this pass. If you scanned forward, dp[cap - weight] would already be updated, meaning you could use the same item multiple times. That gives you the unbounded knapsack. Same problem shape. Different semantics. Both compile, both run, one gives the wrong answer on every input where an item could appear more than once.

Top-down cannot do this. The memo cache holds all computed states simultaneously until the recursion fully unwinds. There is no concept of "I no longer need state (i-2, w)." The call stack might still have a frame that depends on it.

Two-panel diagram: left shows a full n×W grid with all cells filled and a 400MB memory annotation; right shows a single row scanned backward (high-to-low), with write and read cursors, and a 0.4MB annotation

Left: the naive O(n × W) table. Right: one row, scanned high-to-low each pass. The read position is always to the left of the write position, which is what keeps this from accidentally solving the unbounded knapsack.

Python's Recursion Limit Is Not a Suggestion

Python's default recursion limit is 1000 frames. That sounds like a lot until you write a DP over a sequence of length 1,500 and meet RecursionError: maximum recursion depth exceeded. Raise it with sys.setrecursionlimit() and you are buying time, not fixing anything.

The OS stack on macOS and Linux is 8 MB for the main thread, 512 KB for spawned threads. A Python frame costs 100 to 200 bytes of overhead. At depth 10,000 inside a thread, the OS kills the process with a segfault. No Python traceback. No helpful message. The 1,000-frame software limit exists specifically to turn that silent OS kill into a noisy Python exception. You are supposed to see it and switch approaches, not set the limit to 100,000 and hope.

Bottom-up avoids this completely. For loops do not consume call stack. If a problem's natural recursion tree has depth proportional to input size, such as a string of length 5,000 or a DP over a sequence of 3,000 elements, bottom-up is the safe choice by default.

Interval DP Fill Order: Where Bottom-Up Requires Thought

For interval DP problems, bottom-up needs an explicit length-based sweep. Shorter intervals before longer ones, because a length-k interval depends on intervals of length less than k.

n = len(arr)
dp = [[0] * n for _ in range(n)]

for length in range(2, n + 1):
    for start in range(n - length + 1):
        end = start + length - 1
        dp[start][end] = min(
            dp[start][k] + dp[k + 1][end] + cost(start, end, k)
            for k in range(start, end)
        )

Length must be the outer loop, and getting that wrong produces silent incorrect answers. If you iterate by start index first, some inner dependencies are not yet filled when they are first needed. The code runs. It returns a number. The number is wrong. This is the worst kind of bug.

Two Questions to Decide

First: does the problem visit all states, or only some? If the input makes large regions of the state space unreachable, top-down is faster in practice. If all states are needed, bottom-up wins on constant factor.

Second: is memory the binding constraint? Rolling arrays reduce O(n × W) to O(W) and O(mn) to O(min(m, n)). Only bottom-up can do this. If you are on a memory budget, that ends the discussion.

Everything else is a tiebreaker. Python recursion depth favors bottom-up. Interval DP with complex split logic favors top-down for readability. If neither question has a clear answer, bottom-up is the safer default.

The Decision, Summarized

Top-down (memoization) starts at the original problem, recurses toward base cases, caches results. Bottom-up (tabulation) starts at base cases, fills toward the answer, iterates.
Both run in O(states × work per state). Asymptotically identical.
Bottom-up has a lower constant factor: array access instead of hash map, no function call overhead.
Top-down handles sparse state spaces better. It is lazy, computing only what the input actually needs.
Rolling arrays compress O(nW) to O(W) for knapsack and O(mn) to O(n) for LCS. Only bottom-up supports this.
The backward scan in 0/1 knapsack prevents item reuse. Forward scan gives unbounded knapsack. They look the same in a diff. They are not the same problem.
Python's recursion limit is 1000 frames. Top-down on deep inputs crashes. Bottom-up does not.
For interval DP, top-down recurses naturally. Bottom-up requires explicit length-ordered loops.

Explaining these tradeoffs under pressure, in real time, is harder than it looks on paper. SpaceComplexity runs voice-based DP interviews where you justify your top-down or bottom-up choice before you write a line of code.

For more on how each approach relates to the subproblem graph, see Memoization vs Tabulation. For the rolling-array mechanics across LCS and knapsack, see Bottom-Up Dynamic Programming. For how to recognize whether DP applies at all, see When to Use Dynamic Programming.