Backtracking Algorithm: Choose, Explore, Undo, and Never Miss a Solution

Every backtracking problem is a tree. You just don't see the tree until you start running the code.

The tree is implicit. You construct it node by node as you recurse, and you dismantle it in reverse as you undo your choices. The backtracking algorithm that walks it follows a loop so regular you can almost chant it: choose something, explore the consequences, then unchoose it and try the next option. That's the whole pattern. Three lines, plus a base case.

If that sounds too simple, it's because most explanations treat backtracking as a list of recipes (here's how to do N-Queens, here's how to do subsets) rather than as a single idea that generates those recipes automatically. The single idea is the loop. Everything else is a variation.

The Tree You Never See

Consider generating all permutations of [1, 2, 3]. There are six: [1,2,3], [1,3,2], [2,1,3], [2,3,1], [3,1,2], [3,2,1].

Imagine a tree. At the root, you haven't chosen anything. At the first level, you've chosen one element (three branches: pick 1, pick 2, pick 3). At the second level, you've chosen two (two branches each, since one is already used). At the leaves, all three are chosen and you have a complete permutation.

Decision tree for permutations of [1, 2, 3] showing 6 leaf nodes The implicit decision tree for [1, 2, 3]. DFS visits all 6 leaves by going deep, recording the result, and retracing.

This is the decision tree. It has n! leaves (the complete permutations) and roughly twice as many internal nodes. The backtracking algorithm is just a DFS over this tree. You go deep, hit a leaf, record the result, and retrace your steps to try the next branch. Knuth lays out this framing precisely in TAOCP Volume 4B, Section 7.2.2 "Backtrack Programming".

The "implicit" part means the tree doesn't exist in memory all at once. Each node is represented only by the current call stack frame. The recursion stack is the path from root to the current node, and the undo step is how you climb back up. That's why you need to undo: without it, you're permanently at the wrong node with no way home.

The Loop: Choose, Explore, Unchoose

Strip every backtracking solution to its skeleton and you get this:

def backtrack(path, choices):
    if is_complete(path):
        record(path)
        return

    for choice in choices:
        if not is_valid(choice, path):
            continue           # prune this branch entirely

        # choose
        path.append(choice)
        mark_used(choice)

        # explore
        backtrack(path, remaining_choices)

        # unchoose
        path.pop()
        unmark_used(choice)

The "choose" and "unchoose" steps happen around the recursive call. This is the pre-order and post-order position in a tree traversal. Making the choice before the call moves you to a child node. Undoing it after the call returns you to the parent. The loop tries each sibling in sequence.

The choose/explore/unchoose loop with call stack visualization The call stack IS the path from root to the current node. The undo step is how you climb back up the tree without actually allocating the tree.

The undo step is not optional, and it's the thing beginners most often forget. If you mutate path before the recursive call and don't reverse the mutation after, you'll find that your "path" at the second leaf contains state left over from the first leaf. The bug is silent. You get wrong results, not crashes.

There are two implementation styles: passing a mutable shared object (mutate then undo) or passing immutable snapshots (copy on each call). The mutable style is faster because it avoids O(n) copies at each level. The snapshot style is more readable. For interview purposes, either works. For performance, mutate and undo.

No Exit Condition

Before going further, here is a public health announcement:

Gru's plan meme: Learn to program, Make recursive function, No exit condition, No exit condition The recursion is calling itself. The meme is calling itself. Everything is fine. (r/ProgrammerHumor, 46k upvotes)

The base case is not decoration. It is the only reason the recursion ends. In backtracking, your base case is "the path is complete." Forget it and you get a stack overflow wearing wrong-answer clothing. Put it first. Check it before the loop.

The Shape of the Tree Determines the Complexity

Every backtracking problem has two numbers that matter: the branching factor (how many choices at each node) and the depth (how deep the tree goes). The worst-case node count is branching-factor^depth. But actual complexity depends on how you're counting and what you're generating.

Subsets (Power Set)

For each element, you make a binary choice: include it or skip it. Branching factor is 2, depth is n. The tree has 2^n leaves, one per subset.

Include/exclude tree for [1, 2, 3]:

           []
         /    \
       [1]    []
       / \    / \
    [1,2] [1] [2] []
     /\   /\   /\  /\
 [1,2,3][1,2][1,3][1][2,3][2][3][]

Generating all subsets costs O(n * 2^n): there are 2^n subsets, and copying each one onto the result list takes O(n) time.

Combinations

Choosing k elements from n, where order doesn't matter. You use a start index to avoid revisiting earlier elements. Branching factor shrinks as start advances, and you stop at depth k.

k=2 from [1,2,3,4]: start at each position, never go back

  start=1       start=2      start=3
  [1,2],[1,3],[1,4]   [2,3],[2,4]   [3,4]

The number of leaves is C(n,k). Total time is O(C(n,k) * k), paying O(k) per result copy.

Permutations

All orderings of n elements. You use a used[] array (or an in-place swap) to track which elements are still available. Branching factor starts at n and decreases by 1 at each level.

The tree has n! leaves. Total time is O(n * n!): n! permutations, each of length n to copy.

Why the branching factor changes across these three

Subsets always branch exactly 2 ways, regardless of position. Combinations start wide and narrow because you're forbidden from going backward in the array. Permutations always branch at the number of unused elements, so the branching factor steps down 1 per level: n * (n-1) * (n-2) * ... * 1 = n! total paths.

Side-by-side comparison of subset binary tree, narrowing combination tree, and shrinking permutation tree Same algorithm, three different tree shapes. The code structure is identical; only the "what choices are valid here" logic changes.

Problem	Tree shape	Leaves	Total time (including output)
Subsets of n	Complete binary, depth n	2^n	O(n * 2^n)
k-Combinations from n	Pruned, depth k	C(n,k)	O(k * C(n,k))
Permutations of n	Full, depth n, shrinking width	n!	O(n * n!)
Combination Sum (reuse allowed)	Depth T/M, width N	O(N^(T/M))	O(T/M * N^(T/M))
N-Queens	Depth N, width N, heavy pruning	<< N!	O(N!) worst case

The space complexity for all of these is O(n) for the recursion stack, ignoring the output. This is because the depth of the tree (and thus the maximum stack depth) is always bounded by the input size.

Pruning: Where the Real Wins Come From

Pure backtracking without pruning is brute-force enumeration. The algorithm is correct but slow. Pruning is how you cut branches off the tree before exploring them.

The rule is simple: if a partial solution already violates a constraint, no extension of it can be valid, so don't extend it.

Pruning diagram: unpruned tree visits all nodes, pruned tree skips entire subtrees early Left: brute force, every node visited, failures discovered at the leaves. Right: pruning, invalid subtrees cut at the root. Same result, far less work.

Three categories of pruning:

1. Hard constraint violations

Check the constraint before making the recursive call. If a queen placement threatens an existing queen, skip it. If adding the current number to the running sum already exceeds the target, break the loop (if the candidates are sorted). These are "prune early" checks in the validity condition.

# Without pruning: try all, throw out invalid results at the end
# With pruning: check the constraint before going deeper

for choice in choices:
    if violates_constraint(choice, path):
        continue   # entire subtree skipped
    path.append(choice)
    backtrack(path)
    path.pop()

For combination sum with sorted candidates, once sum + candidates[i] > target, every subsequent candidates[j] (j > i) also exceeds the target. So you can break instead of continue, cutting the rest of the loop.

2. Feasibility checks (forward-looking pruning)

Even if the partial solution is valid so far, it might be impossible to complete it. Example: if you need to pick k more elements but fewer than k remain in the array, there's no point continuing. This check costs almost nothing and can eliminate huge subtrees.

# In combination generation: if fewer elements remain than we still need, stop
if (n - start + 1) < (k - len(path)):
    return

3. Symmetry breaking and deduplication

When the input has duplicate elements, naive backtracking generates duplicate results. The fix: sort the input, then skip any candidate that is identical to the previous one at the same recursion level.

for i in range(start, len(nums)):
    if i > start and nums[i] == nums[i - 1]:
        continue   # skip duplicate sibling
    ...

This doesn't skip elements that are equal to something deeper in the tree (that would be wrong). It only skips siblings that would produce the same subtree.

N-Queens: Constraint Satisfaction in Action

N-Queens is the canonical hard constraint problem. Place n queens on an n×n board so no two queens share a row, column, or diagonal.

The backtracking strategy: place one queen per row (rows are the depth levels), and for each row, try each column (columns are the choices). The constraint check eliminates any position attacked by a previously placed queen.

A naive check scans the board. An optimized check tracks three sets: cols (occupied columns), diag1 (r - c is constant along left diagonals), diag2 (r + c is constant along right diagonals).

N-Queens board showing a queen at row 0, col 1 with column and diagonal attacks highlighted One queen, three sets. Every attacked cell in the column (red) and both diagonals (amber) blocked in O(1). No board scan needed.

def solve_n_queens(n: int) -> list[list[str]]:
    result = []
    queens = []  # queens[row] = col
    cols = set()
    diag1 = set()  # r - c
    diag2 = set()  # r + c

    def backtrack(row: int) -> None:
        if row == n:
            board = []
            for c in queens:
                board.append("." * c + "Q" + "." * (n - c - 1))
            result.append(board)
            return
        for col in range(n):
            if col in cols or (row - col) in diag1 or (row + col) in diag2:
                continue  # pruned
            cols.add(col)
            diag1.add(row - col)
            diag2.add(row + col)
            queens.append(col)
            backtrack(row + 1)
            queens.pop()
            cols.remove(col)
            diag1.remove(row - col)
            diag2.remove(row + col)

    backtrack(0)
    return result

The tree has n levels (rows) and up to n branches per level. Without pruning, that's n^n nodes. With constraint checking, the actual nodes visited are far less than n!, which is itself far less than n^n. For n=8, there are exactly 92 distinct solutions out of n^n = 16,777,216 unconstrained placements (the solution count is OEIS A000170). Backtracking finds all 92 while visiting on the order of a few thousand nodes, with the exact count depending on row and column ordering.

The diagonal invariants (r - c and r + c) are the insight most solutions miss. Using them replaces an O(n) scan per placement with O(1) set lookups.

One Structure, Every Language

The canonical example: generate all permutations of a list of distinct integers. This demonstrates the full choose/explore/unchoose loop with a used[] array.

def permutations(nums: list[int]) -> list[list[int]]:
    result: list[list[int]] = []
    used = [False] * len(nums)

    def backtrack(path: list[int]) -> None:
        if len(path) == len(nums):
            result.append(path[:])
            return
        for i in range(len(nums)):
            if used[i]:
                continue
            used[i] = True
            path.append(nums[i])
            backtrack(path)
            path.pop()
            used[i] = False

    backtrack([])
    return result

Problems the Backtracking Algorithm Is Built For

Backtracking applies whenever you need to enumerate valid configurations over a combinatorial space and can detect invalid prefixes early. The broad categories:

Enumeration problems. Generate all subsets, permutations, combinations, or partitions of a set. There's no smarter algorithm than backtracking here. You need all of them.

Constraint satisfaction problems. N-Queens, Sudoku, crossword generation, graph coloring. The structure is: assign values to variables one at a time, check constraints after each assignment, backtrack on violation. Russell and Norvig give the canonical treatment in AIMA Chapter 6, "Constraint Satisfaction Problems". The constraint check is where all the performance lives.

Path-finding in implicit graphs. Word search on a grid, maze solving, finding a Hamiltonian path. The "graph" is usually a 2D grid. The state includes the visited cells in the current path, not a global visited set, because different starting paths can legitimately share the same cell.

Partition and subset-sum problems. Partition to K equal subsets, combination sum, palindrome partitioning. These have the choose/skip structure of subset generation combined with a numerical target constraint.

Expression and string problems. Generate parentheses, letter combinations of a phone number, IP address restoration. The branching structure follows the shape of the string being built. Compare this with the linear-scan family in The Sliding Window Algorithm, where the answer is one window not an enumeration of all of them.

How to Spot a Backtracking Problem

Two questions tell you whether backtracking applies. Do you need all valid configurations, or just one? Can you detect failure early, before fully building each candidate?

If both answers are yes, you're looking at a backtracking problem.

Four concrete signals in the problem statement:

"Generate all..." or "Find all..."
"Return all valid..." with explicit constraints
A combinatorial structure where order matters (permutations) or doesn't (subsets, combinations)
A grid where you move and must not revisit the same cell in the current path

Worked example 1: Letter Combinations of a Phone Number

Problem: Given a string of digits (2-9), return all possible letter strings they could represent on a phone keypad. "23" gives ["ad","ae","af","bd","be","bf","cd","ce","cf"].

Why backtracking? You're building a string character by character. At each position, you choose a letter from the digit's mapping. The number of branches at each node is the size of the mapping (2-4 letters). Depth is the number of digits. You need all combinations.

Tree shape: Each digit adds one level, each letter is one branch. Total leaves: product of mapping sizes. For "23", that's 3 * 3 = 9.

def letter_combinations(digits: str) -> list[str]:
    if not digits:
        return []

    keypad = {
        "2": "abc", "3": "def", "4": "ghi", "5": "jkl",
        "6": "mno", "7": "pqrs", "8": "tuv", "9": "wxyz",
    }
    result = []

    def backtrack(index: int, path: list[str]) -> None:
        if index == len(digits):
            result.append("".join(path))
            return
        for letter in keypad[digits[index]]:
            path.append(letter)
            backtrack(index + 1, path)
            path.pop()

    backtrack(0, [])
    return result

No used[] array here because you always advance forward (index increases). No pruning either. All paths lead to valid completions. This is the cleanest possible backtracking problem.

Worked example 2: Word Search on a Grid

Problem: Given an m×n character grid and a word, return true if the word can be found by sequentially adjacent (horizontal or vertical) cells, without reusing any cell.

Why backtracking? You're building a path through the grid. At each step, you choose one of four neighbors. The constraint is that you can't revisit a cell within the current path. You can prune early if the current cell doesn't match the corresponding character of the word.

The key insight: This is DFS on a grid, but you need per-path visited tracking, not global. A global visited array would prevent revisiting a cell across different paths, which is wrong. You mark cells visited before the recursive call and unmark them after. That's exactly choose/explore/unchoose.

def word_search(board: list[list[str]], word: str) -> bool:
    rows, cols = len(board), len(board[0])

    def backtrack(row: int, col: int, index: int) -> bool:
        if index == len(word):
            return True
        if row < 0 or row >= rows or col < 0 or col >= cols:
            return False
        if board[row][col] != word[index]:
            return False  # pruned: wrong character

        original = board[row][col]
        board[row][col] = "#"  # mark visited (choose)

        found = (
            backtrack(row + 1, col, index + 1) or
            backtrack(row - 1, col, index + 1) or
            backtrack(row, col + 1, index + 1) or
            backtrack(row, col - 1, index + 1)
        )

        board[row][col] = original  # restore (unchoose)
        return found

    for r in range(rows):
        for c in range(cols):
            if backtrack(r, c, 0):
                return True
    return False

The "visited" marker overwrites the cell inline rather than using a separate boolean grid. This works because the character at that cell won't be #, so any revisit attempt during the same path is automatically pruned by the character check.

A Bug You Will Write

Here's the version that looks right but isn't:

def subsets_wrong(nums):
    result = []

    def backtrack(start, path):
        result.append(path)   # BUG: appending the list object, not a copy
        for i in range(start, len(nums)):
            path.append(nums[i])
            backtrack(i + 1, path)
            path.pop()

    backtrack(0, [])
    return result

All entries in result will be the same empty list. You appended the list reference, not its contents. Every subsequent mutation of path mutates the objects already stored in result.

The fix is result.append(path[:]) or result.append(list(path)). Any time you collect a mutable structure into results, you must copy it at the moment of collection, not before or after.

This exact bug shows up in N-Queens (collecting the board), in combination sum (collecting candidates), in every enumeration problem. Burn it into memory.

Backtracking or DP?

Interviewers love asking why a problem is one and not the other, so be ready. Both explore a space of choices. What separates them is whether subproblems repeat.

In backtracking, the state at each node is the path you took to get there, the specific sequence of choices made so far. Two paths that arrive at the same "position" are different nodes in the tree, so there's nothing to cache. You can't memoize an exploration of (row=3, col=2) in word search, because the visited set is different every time you arrive there.

If you find yourself writing backtracking code that solves the same subproblem repeatedly and the subproblem's result doesn't depend on the path you took, switch to DP. The canonical test: can you define the subproblem purely by position (without needing the path)? If yes, DP. If no, backtracking.

For more on recognizing when DP applies, see Dynamic Programming Is Just Recursion With a Memory.

The Hidden Cost in the Complexity

One thing that gets hand-waved in most complexity analyses: the O(n) copy you pay at every leaf of the tree is not free. Walk through the derivation for subsets and permutations in Backtracking Time Complexity if the next two paragraphs feel handwavy.

For permutations: n! leaves, O(n) copy each = O(n * n!) total time. For n=10, that's 36,288,000 copy operations just for the output, before counting the internal traversal.

For subsets: 2^n leaves, O(n) copy each = O(n * 2^n). For n=20, that's 20,971,520 copy operations.

This matters in two situations. First, if you're only asked to count the results rather than enumerate them (like "how many valid parenthesizations exist"), you can skip the copy entirely and the algorithm gets dramatically cheaper. Second, if the problem has a "return any one valid solution" variant, you can return early on the first leaf hit and avoid the full traversal.

The complexity numbers in the operations table above already include the copy cost. If your interview solution only counts or checks existence, knock the O(n) factor off.

Recap

Every backtracking problem is DFS on an implicit decision tree you construct via recursion.
The loop is always: choose, explore, unchoose. The unchoose step is mandatory and must perfectly mirror the choose step.
The complexity is O(b^d) for the tree traversal, plus the cost of collecting results (usually O(n) per leaf).
Subsets cost O(n * 2^n), permutations cost O(n * n!), k-combinations cost O(k * C(n,k)).
Pruning cuts branches before exploring them, using hard constraints, feasibility checks, or symmetry breaking.
When collecting mutable structures into a result list, always copy them at the leaf, not the reference.
Backtracking and DP are different tools: backtracking when you need all configurations and no subproblems repeat, DP when there are overlapping subproblems and you want the optimum.
N-Queens, word search, and combination sum are the three archetypes. Learn those shapes. Most interview backtracking problems are variations.

If you want to practice articulating these patterns under interview conditions, SpaceComplexity runs voice-based mock interviews with rubric-driven feedback on exactly how you explain your pruning decisions and complexity reasoning. Speaking the choose/explore/unchoose loop out loud while coding is different from writing it in silence.

Backtracking questions live in a zone where the code is short but the explanation decides the round. Train the explanation.