What Is Tree DP? Dynamic Programming on Trees, Explained

Most tree problems are DFS in disguise. You visit a node, recurse into children, do something. Tree DP is DFS too. Same skeleton. Same postorder pattern. Same base case on null.

The difference is exactly one thing: what you return from the recursion carries meaningful state upward, and the parent uses that state to make a decision.

If you can write a postorder DFS, you can write tree DP. The hard part, the only hard part, is knowing what to return.

DFS vs tree DP: they're the same picture

DFS vs tree DP: they're the same picture. The return value changes everything.

What Tree DP Actually Is

Dynamic programming always breaks a problem into overlapping subproblems, solves each once, and combines the results. Dynamic programming on trees uses subtrees as the subproblems: each subtree solves itself and hands a state upward to its parent.

The parent combines what its left subtree reported and what its right subtree reported to compute its own answer.

The traversal order is postorder. Left child first. Right child next. Then the current node. You never process a node until both subtrees have resolved their states. If you need a refresher on traversal orders, the binary tree traversal post covers all four.

This is bottom-up DP applied to a tree-shaped dependency graph.

The Skeleton Never Changes

Every tree DP problem follows the same depth-first search structure, with a richer return value:

def solve(node):
    if not node:
        return base_case

    left_state = solve(node.left)
    right_state = solve(node.right)

    return combine(node.val, left_state, right_state)

base_case is the return value for a null node. combine is the logic that computes this node's state from its children. Different problems fill those in differently. The shape is always the same.

You have probably written this function before without realizing it. The only thing missing was a return value worth keeping.

Warmup: Diameter of a Binary Tree

LeetCode 543 asks for the longest path between any two nodes in the tree.

The path through any given node equals the height of its left subtree plus the height of its right subtree. So the DP state you propagate upward is height. The answer you actually care about, the diameter, is a side effect updated at each node.

def diameter_of_binary_tree(root: TreeNode) -> int:
    max_diameter = 0

    def height(node: TreeNode) -> int:
        nonlocal max_diameter
        if not node:
            return 0
        left = height(node.left)
        right = height(node.right)
        max_diameter = max(max_diameter, left + right)
        return 1 + max(left, right)

    height(root)
    return max_diameter

height() returns height. The diameter is updated as a side effect. This pattern, where the return value and the tracked answer are different things, shows up constantly in tree DP. Your function is lying about what it does, and that is correct.

Two States Per Node, One Function

Diameter uses one state per node. The harder and more representative class carries multiple values up through each node.

LeetCode 337, House Robber III, is the canonical example.

You have a binary tree where each node holds some amount of money. The constraint: you cannot rob two adjacent nodes. Maximize the total theft.

At every node, you have two choices. Rob it, or skip it. If you rob it, neither child can be robbed. If you skip it, each child can independently be robbed or skipped, whichever is better.

So the state returned from each node is a pair: the best total assuming this node is robbed, and the best total assuming this node is skipped.

def rob(root: TreeNode) -> int:
    def dp(node: TreeNode) -> tuple[int, int]:
        if not node:
            return (0, 0)  # (rob_this, skip_this)

        left_rob, left_skip = dp(node.left)
        right_rob, right_skip = dp(node.right)

        rob_this = node.val + left_skip + right_skip
        skip_this = max(left_rob, left_skip) + max(right_rob, right_skip)

        return (rob_this, skip_this)

    return max(dp(root))

Trace through a small example:

Leaf 3 (child of node 2): returns (3, 0). Leaf 1 (child of right 3): returns (1, 0).

Node 2: rob_this = 2 + 0 + 0 = 2, skip_this = max(0,0) + max(3,0) = 3. Returns (2, 3). Node 3 (right): rob_this = 3 + 0 + 0 = 3, skip_this = max(0,0) + max(1,0) = 1. Returns (3, 1).

Root 3: rob_this = 3 + 3 + 1 = 7, skip_this = max(2,3) + max(3,1) = 6. Answer: max(7, 6) = 7.

Rob the root (3), skip both direct children, rob the grandchildren (3 and 1). Total = 7.

How to Define the State

This is where people get stuck. The tuple is the hardest part to figure out, and it is where most candidates spiral in an interview.

The state is the minimum information a node needs to hand upward so its ancestors can make correct decisions. Not the final answer. What options this subtree leaves open.

For House Robber: the parent needs to know the best outcome when this child was robbed and when it wasn't. The parent's own choice determines which of those it can use.

For diameter: the parent only needs height from each child. That is enough to compute the candidate diameter at the parent level.

For Binary Tree Maximum Path Sum (LeetCode 124): you return the best single-arm path starting at this node going down. The global answer, tracked separately with a nonlocal variable, is the best two-arm path that could pass through any node.

Each problem differs. The process is the same: ask what the parent needs, not what the final answer is.

O(n) Time, O(h) Space, No Memo Table

Time is O(n). Each node is visited exactly once. The work per node is O(1), combining two child states. Total: O(n).

This is tree DP's main advantage. Brute force for House Robber III would try exponentially many subsets. Tree DP gets it linear.

Writes recursive algorithm / no stack overflow

Tree DP: visits every node once, needs no memoization table, and still somehow gets the right answer.

Space is O(h) for the call stack, where h is tree height. Balanced tree: O(log n). Skewed tree (a linked list in a trench coat): O(n). No explicit memoization table is needed. Because the tree has no cycles, each subtree is solved exactly once, so there is nothing to cache.

Unlike grid DP where you need an m x n table, the call stack is your table.

When to Reach for Tree DP

Three signals in the problem statement:

The answer depends on combining values from both subtrees at every node. If you find yourself thinking "at each node, I need to know what happened below left and below right," that is tree DP.

There is a constraint between parent and child. House Robber has the adjacency rule. Other examples include counting subtree nodes satisfying a property, or finding paths where all edges meet some condition. The constraint travels upward through the returned state.

The phrase "any path" appears. Diameter, maximum path sum, longest univalue path. These all compute a candidate answer at every internal node by combining left and right child information.

If you see those signals, your DFS needs to return a state instead of void.

Tree DP Is Just Bottom-Up DP

Tree DP is bottom-up DP on a DAG that happens to be a tree. Each node is a state. Each edge is a dependency. You compute states in topological order, which for a tree is postorder traversal.

There are no repeated subproblems because the tree structure guarantees each subtree is reachable from exactly one parent. That is why you need no memoization. The recursion vs dynamic programming post explains the general relationship; trees make it concrete because the subproblem graph is explicit.

If the graph had cycles, the same approach would break. Trees are acyclic by definition. That is what makes O(n) tree DP possible.

Interview Tips

Define your state before writing code. Write a one-line comment: "this function returns (rob this node, skip this node)." If you cannot say it in one sentence, you have not defined it yet. Interviewers watching you struggle to define the tuple will dock you on problem-solving, even if you eventually get the code right.

Handle null explicitly. The base case is always a null node returning a neutral value: (0, 0), 0, float('-inf'), depending on the problem. Get that right and the recursion composes cleanly.

Track the global answer separately when the return value and the answer are different things. Diameter and maximum path sum both need a nonlocal variable updated at each node. The function returns height or a single-arm path. The actual answer is a side effect.

Practice in order: LeetCode 543, then 337, then 124. That sequence builds the complete tree DP pattern, from single-state to multi-state to "return one thing, track another." These three problems cover the full range you will see in an interview.

SpaceComplexity lets you practice these problems in a timed voice mock interview. The gap that shows up most often is not the code. It is explaining why the tuple has two elements. Saying "this returns rob and skip" out loud, under pressure, to a human (or an AI that scores your reasoning) is different from writing it in silence.