What Are Catalan Numbers? The Counting Pattern Behind BSTs and Brackets

June 17, 20269 min read
dsaalgorithmsinterview-prepdynamic-programming
What Are Catalan Numbers? The Counting Pattern Behind BSTs and Brackets
TL;DR
  • Catalan numbers count recursive structures that split into two independent halves: valid bracket sequences, distinct BST shapes, and polygon triangulations.
  • The recurrence C(n) = sum of C(i) × C(n−1−i) is more useful than the closed form because it shows exactly why these numbers appear.
  • LeetCode 96 (Unique Binary Search Trees) is the canonical Catalan DP: the number of structurally distinct BSTs with n nodes is exactly C(n).
  • Counting vs generating is the key distinction: counting takes O(n²) DP; generating all structures takes O(C(n) × n), which is exponential.
  • Recognition signal: if your recurrence naturally produces dp[i] += dp[j] * dp[i-1-j] and you're summing over all splits, it's a Catalan problem.
  • Growth rate: C(n) scales as 4^n / (n^(3/2) × √π), putting C(20) above 6 billion — never enumerate at scale.

If you've solved LeetCode 96, you got the answer for n=3 and moved on. You verified it was 5, felt briefly smart, and moved on. You never asked why it's 5 and not 4 or 6. The answer for n=4 is 14. For n=5 it's 42. Yes, that 42. There's a formula behind those numbers, and once you see it, a whole class of interview counting problems goes from "how would I even start" to "I know exactly what recurrence to write."

Catalan numbers count structures that grow by splitting into two independent halves: valid bracket sequences, distinct binary tree shapes, ways to triangulate a convex polygon. The sequence is 1, 1, 2, 5, 14, 42, 132, 429, and it shows up whenever a recursive structure of size n decomposes into a left piece of size i and a right piece of size n-1-i.

This is for engineers who want to recognize Catalan-numbered problems in interviews and derive the DP recurrence from scratch, not just recall it.

Five Sequences, Three Pairs

Start with something you can count by hand. How many ways can you arrange 3 pairs of parentheses so the result is valid?

((()))
(()())
(())()
()(())
()()()

Five. C(3) = 5. You can verify this by staring at the list and noticing there are no others. That's a valid proof at n=3.

Two pairs:

(())
()()

Two. C(2) = 2.

One pair: (). One way. C(1) = 1. C(0) = 1 by convention, which sounds like a lie but is necessary for the recurrence to not fall apart at the base case.

The sequence: 1, 1, 2, 5, 14, 42, 132, 429... Named after Eugène Catalan, though Minggatu computed them in China in the 1730s while working on trigonometric series, because mathematics has always been good at discovering the same thing twice.

Two Formulas, One Actually Useful

Closed form, using the central binomial coefficient:

C(n) = C(2n, n) / (n + 1)

Or equivalently:

C(n) = (2n)! / ((n+1)! * n!)

Mathematically clean. Also requires computing factorials up to (2n)!, which means intermediate values that overflow before you can blink at n=20. Fine if you have math.comb. Hostile if an interviewer expects you to derive it from scratch.

Recurrence (what DP problems actually use):

C(0) = 1
C(n) = C(0)*C(n-1) + C(1)*C(n-2) + ... + C(n-1)*C(0)
     = sum of C(i) * C(n-1-i) for i from 0 to n-1

The recurrence is the important one because it reveals why these numbers appear: each term C(i) * C(n-1-i) counts arrangements where one half has size i and the other has size n-1-i.

Why the Recurrence Makes Sense

This is the part most posts skip, which is exactly why people forget the formula.

Think about binary trees with n nodes. Pick a root. The left subtree has i nodes and the right has n-1-i nodes. There are C(i) ways to arrange the left and C(n-1-i) ways to arrange the right, and crucially, they don't interact at all. So the number of trees with root splitting as (left=i, right=n-1-i) is the product. Sum over all possible i from 0 to n-1.

The Catalan split: root divides into left subtree with C(i) arrangements and right subtree with C(n-1-i) arrangements, summed over all splits

The same logic applies to parentheses. A valid sequence of n pairs opens at position 0, and its matching close happens at position 2k for some k. That gives an "inner" valid sequence of k-1 pairs and an "outer" valid sequence of n-k pairs. Sum over all k.

The shape of the argument is always the same: split into two independent pieces, count each piece, multiply, sum. Any structure with this shape produces Catalan numbers. That's the whole thing. It sounds too simple, and then you check that it gives the right answers and realize it actually is that simple.

Computing C(n) With DP

def catalan(n: int) -> int: dp = [0] * (n + 1) dp[0] = 1 for i in range(1, n + 1): for j in range(i): dp[i] += dp[j] * dp[i - 1 - j] return dp[n]

Time: O(n²). For each dp[i], you loop over i positions. Space: O(n) for the table.

There's also a one-liner using the closed form:

from math import comb def catalan(n: int) -> int: return comb(2 * n, n) // (n + 1)

The DP version is more useful in interviews because the recurrence is visible and you can narrate it. The closed form is fine if the problem just needs the count and you can explain why it's equivalent. Neither is wrong. One of them makes you look like you understand where the formula comes from.

See recursion time complexity and the dynamic programming framework for why the DP table beats naive recursion here.

Where This Shows Up in Interviews

Unique Binary Search Trees (LeetCode 96)

Given nodes 1 through n, count the structurally unique BSTs. The answer is C(n).

Samuel L. Jackson: "BINARY TREE MOTHERFUCKER, DO YOU INVERT IT?", the energy of every technical interviewer who just assigned LeetCode 96

The interviewers do not actually want you to invert it. They want you to count the shapes.

For each choice of root r, the left subtree uses r-1 nodes and the right uses n-r nodes. The shapes depend only on the counts, not the specific values. So you get C(r-1) * C(n-r) shapes per root, summed over all r from 1 to n. That is exactly the Catalan recurrence with different variable names.

def numTrees(n: int) -> int: dp = [0] * (n + 1) dp[0] = 1 for nodes in range(1, n + 1): for root in range(1, nodes + 1): dp[nodes] += dp[root - 1] * dp[nodes - root] return dp[n]

Valid Parentheses Combinations (LeetCode 22)

LeetCode 22 asks you to generate all valid sequences of n pairs, not just count them. The count is C(n), but generating requires backtracking. The time is O(C(n) * n): there are C(n) valid sequences, each of length 2n.

This problem is the clearest illustration of the counting-vs-generating distinction. Counting takes O(n²). Generating is exponential. Same concept, two completely different problems.

Full Binary Trees (LeetCode 894)

Count full binary trees (every node has 0 or 2 children) with exactly n nodes where n is odd. Splitting a full binary tree with n nodes at the root gives a left subtree with 2k-1 nodes and a right with n-2k nodes, for k from 1 to (n-1)/2. Same product-and-sum structure, modified bounds.

Polygon Triangulation

A convex polygon with n+2 sides can be triangulated in exactly C(n) ways. This shows up in interval DP problems. Ways to parenthesize a product of n+1 matrices is also C(n). Matrix chain multiplication and polygon triangulation are structurally identical problems wearing different hats. Interviews rarely ask about triangulation directly, but once you know they're the same problem, the matrix chain version stops feeling arbitrary.

How to Spot a Catalan Problem in the Wild

You're looking at a Catalan problem when:

  1. The problem asks you to count something recursive where a structure of size n splits into two independent pieces
  2. Balanced or nested sequences appear (brackets, tags, matched pairs)
  3. You're counting distinct binary tree shapes
  4. A "left part of size i" and "right part of size n-1-i" decomposition feels natural

The clearest signal is when you naturally start writing dp[i] += dp[j] * dp[i-1-j] in your DP table. If a product of two subproblem counts appears in your recurrence and you're summing over all ways to split, it's Catalan.

The Growth Rate Is the Whole Complexity Story

C(n) grows roughly as 4^n / (n^(3/2) * sqrt(π)).

nC(n)
01
11
22
35
414
542
6132
1016,796
159,694,845
206,564,120,420

At n=20, there are over six billion valid bracket sequences. You will not generate all of them in an interview. Your interviewer does not want you to. Nobody wants you to. The computer technically can but the interview ends in 45 minutes.

Einstein discovering relativity vs me at 22 asking why the array's index is out of bounds, same energy as "count the structures" vs "generate all of them"

Counting takes O(n²). Generating takes until your interviewer changes their screen share.

Before you write a line of code, know which one the problem is asking for: counting is O(n²) DP; generating all structures is O(C(n) * n), which is exponential.

If you see an enumerate-all problem (like LeetCode 22 or 95), you're doing backtracking and the complexity floor is C(n). If you see a count-only problem (like LeetCode 96), you're doing DP and it's O(n²). Treating them as the same problem is how you end up attempting to enumerate six billion things in 45 minutes and then having an interesting story to tell later.

Practice in the Right Order

Two clear starting points: LeetCode 96 first (counting only, pure Catalan DP), then LeetCode 95 (generating all unique BSTs, recursive enumeration). Solve 96 until you can derive the recurrence from scratch by reasoning about BST structure. Not "I remember the formula." Reason it out from first principles. Then 95 shows you what generating C(n) objects in code actually looks like.

LeetCode 22 (Generate Parentheses) pairs naturally with both. After solving it, practice explaining why it produces exactly C(n) sequences without looking it up. Two sentences. If you can do that from scratch, you've internalized the structure and it won't evaporate before the interview.

Being able to explain this under pressure, not just code it quietly, is a separate skill. SpaceComplexity runs voice-based mock interviews with rubric-based feedback on exactly this: articulating why a recurrence is correct, not just writing it.

Further Reading