What Is a Centroid of a Tree? The Node That Balances Everything

June 11, 20267 min read
dsaalgorithmsinterview-prepdata-structures
What Is a Centroid of a Tree? The Node That Balances Everything
TL;DR
  • Centroid of a tree: the node whose removal leaves every component with at most ⌊n/2⌋ nodes
  • Existence guarantee: every tree has 1 or 2 centroids; two centroids means exactly one edge splits the tree perfectly in half
  • Walk property: after removing any node, at most one component can exceed n/2, so walking toward the heavy child always terminates at the centroid
  • Two DFS passes: compute subtree sizes bottom-up, then descend toward any child with size > n/2 until none qualifies
  • O(n) time and space; use the iterative version in Python for trees larger than a few thousand nodes to avoid stack overflow
  • Interview relevance: the centroid is the building block for centroid decomposition, enabling O(n log n) tree path problems via recursive balanced splitting

Every tree has a balance point. A node that, when removed, leaves nothing behind that's "too big." That node is the centroid. Remove it and every resulting component has at most ⌊n/2⌋ nodes. That guarantee is what makes the centroid useful, and it's also why finding it is O(n) with a surprisingly simple walk.

It sounds like abstract math until you realize the whole algorithm is just "keep moving toward the heavy side." You've probably done this intuitively when picking up a table with someone. The centroid is where you both naturally end up standing.

What the Centroid of a Tree Actually Means

A centroid of a tree is a node whose removal splits the tree into components where none has more than ⌊n/2⌋ nodes, where n is the total number of nodes.

Every tree has at least one centroid. No tree has more than two. When two exist, they're always adjacent.

The Example That Makes It Click

Nine-node tree:

            1
          / | \
         2  3  4
        /      |
       5        6
      / \        \
     7   8        9

n = 9, so ⌊n/2⌋ = 4. For a node to be a centroid, every component after its removal must have at most 4 nodes.

Node 1: Remove it. Components: {2, 5, 7, 8} (size 4), {3} (size 1), {4, 6, 9} (size 3). Largest is 4. Centroid.

Node 2: Remove it. Components: {1, 3, 4, 6, 9} (size 5), {5, 7, 8} (size 3). Largest is 5. Not a centroid.

Node 5: Remove it. Components: {1, 2, 3, 4, 6, 9} (size 6), {7}, {8}. Way over. Not a centroid.

Node 1 is the only centroid here. Nodes near the leaves fail because removing them leaves most of the tree behind as one enormous component, which is the structural version of picking up a table from the corner and wondering why you're doing all the work.

At Most One Direction Can Be Wrong

This is the insight that makes the algorithm work.

If you remove any node v, the remaining pieces partition the other n-1 nodes. If two pieces each had more than n/2 nodes, their combined size would exceed n-1. That's impossible.

At most one component can exceed n/2 nodes after removing any single node.

So if you're standing at a node that isn't the centroid, the violation lives in exactly one direction. Walk toward it. You'll find the centroid. It's like the Price Is Right podium walk. One wrong door, and you know which door is right.

Finding It in O(n)

Two DFS passes.

Pass 1: Root the tree at any node. Compute the subtree size at every node.

Pass 2: Start at the root. At each node, check if any child's subtree has more than n/2 nodes. If one does, descend into that child. If none does, you're at the centroid.

from collections import defaultdict def find_centroid(n, edges): adj = defaultdict(list) for u, v in edges: adj[u].append(v) adj[v].append(u) size = [1] * (n + 1) def compute_size(node, parent): for child in adj[node]: if child != parent: compute_size(child, node) size[node] += size[child] def get_centroid(node, parent): for child in adj[node]: if child != parent and size[child] > n // 2: return get_centroid(child, node) return node compute_size(1, -1) return get_centroid(1, -1)

After compute_size on the nine-node example:

size[1]=9, size[2]=4, size[3]=1, size[4]=3
size[5]=3, size[6]=2, size[7]=1, size[8]=1, size[9]=1

get_centroid starts at node 1. size[2] = 4, not > 4. size[3] = 1. size[4] = 3. No child exceeds 4. Return node 1.

The starting root doesn't matter. The walk always terminates at the same centroid regardless of where you begin, which is deeply satisfying and also the kind of thing that makes you want to tell someone at a party.

When There Are Two Centroids

Two centroids exist when one edge splits the tree exactly in half.

1 -- 2 -- 3 -- 4

n = 4. Remove node 2: components {1} and {3, 4}, sizes 1 and 2. Remove node 3: components {1, 2} and {4}, sizes 2 and 1. Both nodes 2 and 3 satisfy the condition. They're adjacent, separated by the one edge that halves the tree.

The algorithm above returns one of them. If you need both, check whether the returned node's neighbors also qualify. Most applications only need one.

When Python Decides You've Had Enough

Python's default recursion limit is 1,000 frames. On a path graph with 100,000 nodes, both DFS passes will overflow. Python will stop you. Firmly.

Python's recursion limit in action: TF2 Scout says "I'm gonna stop you right there"

Python, the moment your DFS hits frame 1001.

Use the iterative version:

from collections import defaultdict def find_centroid(n, edges): adj = defaultdict(list) for u, v in edges: adj[u].append(v) adj[v].append(u) size = [1] * (n + 1) parent = [-1] * (n + 1) order = [] stack = [1] visited = [False] * (n + 1) visited[1] = True while stack: node = stack.pop() order.append(node) for child in adj[node]: if not visited[child]: visited[child] = True parent[child] = node stack.append(child) for node in reversed(order): if parent[node] != -1: size[parent[node]] += size[node] node = 1 while True: heavy = next( (c for c in adj[node] if c != parent[node] and size[c] > n // 2), None ) if heavy is None: return node node = heavy

Same O(n) time and O(n) space. Processing nodes in reverse DFS order guarantees children are computed before parents. For more on why this matters for space, see Recursion Space Complexity: Your Stack Only Holds One Path at a Time.

O(n) Time, O(n) Space

Both passes visit every node and edge exactly once. Finding the centroid is O(n) time and O(n) space.

The space comes from the adjacency list, the size array, and the call stack. On a balanced tree the recursion depth is O(log n). On a path graph it's O(n). The iterative version stays O(n) regardless of tree shape.

Why Interviewers Ask About This

The centroid is the foundation for centroid decomposition, a divide-and-conquer technique for tree path problems. Find the centroid, solve the subproblem for all paths passing through it, remove it, recurse on each remaining component.

Because no component ever exceeds n/2 nodes, the recursion depth is O(log n). That yields O(n log n) overall when per-centroid work is O(n), or O(n log² n) when sorting is needed per level. Problems in this class include counting pairs with path length exactly k and finding the closest node with a given value from any starting point. For the full treatment, see Centroid Decomposition: One Node That Balances Everything.

At most companies you won't implement centroid decomposition from scratch in 45 minutes. But knowing what a centroid is, why the walk works, and why the recursion depth stays O(log n) is the kind of depth that separates strong hires at algorithm-heavy firms. It sharpens your intuition about divide-and-conquer on trees generally, which pays off in ways that are hard to predict and easy to see in the debrief.

If you want to practice explaining concepts like this out loud under interview pressure, SpaceComplexity runs voice-based mock interviews with rubric feedback. The definition is easy to memorize. Explaining why the traversal is always correct is where most people discover they didn't actually understand it.

Key Takeaways

  • Centroid: a node whose removal leaves no component with more than ⌊n/2⌋ nodes.
  • Every tree has 1 or 2 centroids. Two centroids means exactly one edge splits the tree in half, and both endpoints qualify.
  • At most one component can exceed n/2 after any removal. That's why "walk toward the heavy child" terminates at the centroid.
  • Two DFS passes. O(n) time, O(n) space.
  • In Python, use the iterative version for n larger than a few thousand. Or raise the recursion limit and enjoy the segfault lottery.
  • The centroid is the building block for centroid decomposition: O(n log n) tree path problems via recursive balanced splitting.
  • For DFS mechanics underlying both passes, see Depth-First Search: The Algorithm Behind Half Your Graph Problems.

Further Reading