Bottom-Up Dynamic Programming: Read the Complexity Off the Table, Then Halve the Space

You solved the problem with memoization. It's correct. It's O(n²) time and you cached every subproblem you've ever visited. The interviewer nods, then asks: "Can you reduce the memory?"

Most people know it's possible. Most can't explain why it works or where the backward scan comes from.

This is the last post in the recursion complexity series. The dynamic programming framework post covers when DP applies and how to identify subproblems. Here we flip memoization into bottom-up dynamic programming, read time and space directly from the table structure, and shrink the space without touching the time. Three worked examples, one non-obvious twist.

Meme: "No problem Dynamic Programming", the classic dad-joke format, now a CS concept

Dad jokes, dynamic programming, same energy. You cache the punchline and reuse it.

Tabulation vs Memoization: The Table Is the Recursion Tree, Rotated

Memoized recursion is top-down: start at the answer you want and recurse toward base cases, caching every result. Tabulation is bottom-up: start at the base cases and build forward, filling cells in the order the subproblems depend on each other.

The conversion rule: find the topological order of the subproblem dependency graph, then fill cells in that order. For one-dimensional problems this is usually a single loop from small indices to large. For two-dimensional problems it's nested loops, typically row by row, left to right within each row.

Fibonacci makes this concrete:

# Memoized (top-down)
from functools import lru_cache

@lru_cache(maxsize=None)
def fib(n: int) -> int:
    if n <= 1:
        return n
    return fib(n - 1) + fib(n - 2)

# Tabulated (bottom-up)
def fib(n: int) -> int:
    if n <= 1:
        return n
    dp = [0] * (n + 1)
    dp[1] = 1
    for i in range(2, n + 1):
        dp[i] = dp[i - 1] + dp[i - 2]
    return dp[n]

Both compute the same n subproblems. Tabulation just does it without a call stack, filling cells in the same order the recurrence arrows point.

Fibonacci recursion tree (left) with repeated overlapping subtrees highlighted vs. a left-to-right array sweep (right) labeled dp[i] = dp[i-1] + dp[i-2]

Same values computed, opposite directions. The tree visits f(3) three times. The array visits every index exactly once.

One practical upside of tabulation: no stack overflow risk on large inputs. Python's default recursion limit is 1000 frames. A memoized solution for fib(5000) crashes. The loop doesn't. That's not a theoretical win.

Reading Time and Space Straight Off the Table

Once the table structure is clear, complexity falls out without needing the Master Theorem or a substitution proof.

Time = (number of cells) × (work to fill each cell). Space = size of the table you actually keep.

For Fibonacci: n+1 cells, O(1) work each. Time O(n), space O(n). Full analysis, two lines.

For Longest Common Subsequence (LCS) of strings of length m and n, the recurrence is:

dp[i][j] = LCS length of text1[:i] and text2[:j]

if text1[i-1] == text2[j-1]:   dp[i][j] = dp[i-1][j-1] + 1
else:                            dp[i][j] = max(dp[i-1][j], dp[i][j-1])

The table has (m+1) rows and (n+1) columns. Each cell does a character comparison and a max, both O(1). Time: O(mn). Space: O(mn).

For 0/1 Knapsack with n items and capacity W:

if weight[i-1] > w:   dp[i][w] = dp[i-1][w]
else:                  dp[i][w] = max(dp[i-1][w], dp[i-1][w - weight[i-1]] + value[i-1])

The table is (n+1) × (W+1). Each cell: O(1). Time: O(nW). Space: O(nW).

Count the table cells, multiply by the work at each cell. If you can sketch the table dimensions and describe what each transition does, you can name the complexity before writing the loop.

LCS grid (left) showing arrows from each cell to the three it reads: diagonally above-left for a match, directly above, and directly left. Knapsack grid (right) showing each cell reading only the row directly above.

LCS needs three neighbors. Knapsack only needs the row above. That difference is why they compress differently.

The Rolling Array: Only Keep What Future Cells Need

Look at the Fibonacci recurrence again. dp[i] uses dp[i-1] and dp[i-2]. Once you compute dp[i], nothing ever reads dp[i-3] or anything older. Those cells are deadweight.

If no future cell looks back more than k steps, you only need k values in memory. For Fibonacci that's two:

def fib(n: int) -> int:
    if n <= 1:
        return n
    prev, curr = 0, 1
    for _ in range(2, n + 1):
        prev, curr = curr, prev + curr
    return curr

Space: O(1). Time: still O(n), the same n-1 additions, the same number of loop iterations. Only what you remember changes, not how much computation you do.

Fibonacci array with a sliding window of width 2 moving left to right. Cells behind the window are grayed out. Only "prev" and "curr" are solid, with the label "O(n) space" crossed out and replaced with "O(1) space".

The window slides. Everything behind it is gone. The answer at the front is the same.

The same logic applies to any 2D table where row i depends only on row i-1. LCS fits: dp[i][j] reads dp[i-1][j-1], dp[i-1][j], and dp[i][j-1]. Only the previous row and the already-filled part of the current row are needed.

def lcs(text1: str, text2: str) -> int:
    # Keep text2 as the shorter string so the arrays stay small
    if len(text1) < len(text2):
        text1, text2 = text2, text1
    m, n = len(text1), len(text2)

    prev = [0] * (n + 1)
    for i in range(1, m + 1):
        curr = [0] * (n + 1)
        for j in range(1, n + 1):
            if text1[i - 1] == text2[j - 1]:
                curr[j] = prev[j - 1] + 1
            else:
                curr[j] = max(prev[j], curr[j - 1])
        prev = curr
    return prev[n]

Space: O(min(m,n)), reduced from O(mn). Time: still O(mn). Every (i,j) cell is visited exactly once. You just discard rows as you go.

LCS 2D grid with two highlighted rows: "prev" (row i-1) and "curr" (row i, being filled left to right). All rows above are faded and labeled "freed". An arrow shows "prev" sliding downward after each outer iteration.

You keep two rows. Everything above is freed. After each outer loop, prev slides down to where curr was.

The Backward Scan: 0/1 Knapsack's Non-Obvious Twist

0/1 Knapsack has the same row structure (row i depends only on row i-1), so it should collapse to a single array just like LCS. It does. But the scan direction matters, and getting it wrong silently turns the problem into something different.

Here's what happens when you scan forward (w increasing from 0 to W):

# WRONG for 0/1 knapsack
for i in range(n):
    for w in range(weights[i], capacity + 1):  # forward scan
        dp[w] = max(dp[w], dp[w - weights[i]] + values[i])

When you compute dp[w] and read dp[w - weights[i]], that index is smaller than w. If the scan is going left to right, you've already updated it for item i this pass. You might add item i again. That's unbounded knapsack, where each item can be used unlimited times. The code compiles, runs, produces a number, and it's wrong. A plausible-looking number. This is the worst kind of bug.

Scanning backward fixes it: when you compute dp[w], the index dp[w - weights[i]] hasn't been touched yet this pass, so it still holds the value from the previous item's row.

def knapsack(weights: list[int], values: list[int], capacity: int) -> int:
    dp = [0] * (capacity + 1)
    for i in range(len(weights)):
        for w in range(capacity, weights[i] - 1, -1):  # backward scan
            dp[w] = max(dp[w], dp[w - weights[i]] + values[i])
    return dp[capacity]

Space: O(W). Time: still O(nW). The inner loop visits the same (i,w) pairs, just in reverse capacity order.

Two horizontal dp arrays side by side. Top: forward scan with a right-pointing arrow. The cell dp[w - weight[i]] is to the left of dp[w], highlighted red, labeled "already updated this pass (item counted twice)." Bottom: backward scan with a left-pointing arrow. dp[w - weight[i]] is to the right of dp[w], highlighted green, labeled "not yet touched, still row i-1 value."

One direction counts each item at most once. The other direction lets items stack freely. Same code, different arrow, different problem.

Practice saying this out loud: the backward direction guarantees that reading dp[w - weight[i]] during item i's pass gives you the row-i-1 snapshot, not an already-updated value.

What Space Optimization Costs You

Rolling arrays trade memory for the ability to reconstruct the solution.

With the full O(mn) LCS table, backtracking from dp[m][n] to dp[0][0] recovers the actual subsequence character by character. At each cell you check where the value came from (diagonal match, up, or left) and follow the path. With two rolling rows, that history is gone. You get the length, not the sequence.

If the problem asks for a length or a count, rolling arrays are safe. If it asks you to return the actual subsequence or the list of chosen items, you need the full table.

One workaround exists for LCS: Hirschberg's algorithm (1975) uses divide-and-conquer on the grid. It computes the midpoint of the optimal path using two rows at a time, recurses on both halves, and concatenates the result. Time stays O(mn), space stays O(m+n). The full LCS string is recovered without ever storing the full grid. It's rarely asked in interviews, but worth knowing the reconstruction tradeoff isn't fundamental.

When Tabulation Doesn't Help With Space

Some DP problems have exponential complexity not because of unoptimized recursion but because the state space is exponential by nature.

Bitmask DP for the Traveling Salesman Problem has O(n² × 2^n) states, one per (current city, subset of visited cities) pair. You can absolutely tabulate this. The table has 2^n columns. But rolling arrays won't help because the subsets don't have a clean linear dependency where "subset layer k depends only on layer k-1." Subset transitions jump around in arbitrary patterns.

Exponential state space means exponential time and space, tabulated or not. The conversion from memoization to tabulation changes overhead constants and avoids stack depth issues, but it cannot shrink the fundamental state count.

The Recipe in One Place

Write the recurrence. What does dp[state] depend on?
Identify fill order. Topological order of the dependency graph, smallest subproblems first.
Count cells and work per cell. This gives you time and space for the full table.
Check the look-back horizon. Does row i depend only on row i-1 (or a fixed number of prior rows)? If yes, roll to that many rows.
Check scan direction. Within the rolled array, does reading an already-updated index cause the current item to be reused? If yes, reverse the scan.

Step 5 is where most people stop halfway. They apply the rolling array and forget to check whether forward or backward fill is correct. The result runs but silently computes the wrong problem.

Flowchart: Write recurrence → Identify topological fill order → Count cells × work (= time) and cells retained (= space) → Diamond: "Row i needs only row i-1?" → YES: Roll to two rows → Diamond: "Same-row scan reads own updates?" → YES: Scan backward → NO: Scan forward.

The full decision tree. Most interviews will only push you to step 4. The backward scan question is how you separate yourself.

If you want an interviewer to push you on this depth of reasoning, SpaceComplexity runs voice-based mock interviews with follow-up questions that don't stop at "what's the time complexity." They'll keep asking until you can explain the backward scan out loud under pressure.

Recap

Tabulation fills subproblems in topological order, smallest first. No call stack, no recursion overhead.
Time = table cells × work per cell. Space = cells you retain. Both read off the table structure directly.
Rolling arrays keep only the frontier, the cells future transitions actually reference. Time is unchanged.
Fibonacci: O(n) space to O(1). LCS: O(mn) to O(min(m,n)). 0/1 Knapsack: O(nW) to O(W).
0/1 Knapsack backward scan prevents item reuse within one pass. Forward scan silently becomes unbounded knapsack.
Space optimization loses reconstruction ability. Keep the full table when you need to trace back the actual solution.
Exponential state spaces (bitmask DP) stay exponential. Tabulation and rolling arrays change constants, not complexity class.