You're Running BFS. The Problem Wanted Union-Find.

- Union-Find runs in O(α(n)) per operation using path halving and union by size, effectively O(1) for any practical input
- Five fingerprints point to Union-Find: incremental edges, repeated connectivity queries, first cycle-forming edge, equivalence grouping, and component counting
- The cycle detection trap: check
find(u) == find(v)before callingunion, never after — checking after is always true and meaningless - Augmented DSU tracks component size, parity for bipartite checking, weighted offsets, and virtual supernodes for group-level connectivity queries
- Use BFS/DFS when edges are static and you need one answer; use Union-Find when edges arrive dynamically or membership queries repeat
- Two-pass pattern (LC 990): process all equality constraints first, then validate inequalities against the completed component structure
The problem gives you a list of edges. You build an adjacency list, fire up BFS, and count connected components. It works. Then the problem adds one twist: edges arrive one at a time, and after each one you need to answer whether two specific nodes are now connected. Your BFS reruns from scratch on every query. O(V+E) per query. With 10^5 queries. Time limit: one second.
You hear the judge typing "Time Limit Exceeded."
This is the moment the interviewer was waiting for. Your BFS solution is not wrong. It's just paying the full price of a graph traversal for a question that doesn't need one.
Union-Find exists for this situation: dynamic connectivity with repeated membership queries. Knowing when to use it comes down to spotting five fingerprints in the problem statement.

The natural progression when you discover your BFS reruns O(V+E) per query with 10^5 queries.
Two Arrays. Two Operations.
Two arrays. parent[i] stores who node i reports to, initialized as parent[i] = i (every node is its own root). size[i] stores how many nodes are in the tree rooted at i. Two operations:
find(x) walks up parent pointers until it hits a root, flattening the path on the way. union(a, b) calls find on both endpoints and, if they belong to different components, attaches the smaller tree under the larger one.
class UnionFind: def __init__(self, n: int): self.parent = list(range(n)) self.size = [1] * n self.components = n def find(self, x: int) -> int: while self.parent[x] != x: self.parent[x] = self.parent[self.parent[x]] # path halving x = self.parent[x] return x def union(self, a: int, b: int) -> bool: ra, rb = self.find(a), self.find(b) if ra == rb: return False # already connected if self.size[ra] < self.size[rb]: ra, rb = rb, ra self.parent[rb] = ra self.size[ra] += self.size[rb] self.components -= 1 return True
The find here uses path halving: at each step, jump to the grandparent and repoint there. One pass instead of two. Empirically this is slightly faster than full path compression because it makes fewer memory accesses and plays nicer with cache lines.
Combined with union by size, each operation runs in O(α(n)) amortized time. α is the inverse Ackermann function. Tarjan proved this in 1975. In practice, α(n) stays below 5 for any n you will ever encounter, which is a polite way of saying union-find is effectively O(1) per operation. The function grows so slowly it's basically a joke. Mathematicians love it.
The Five Fingerprints
These signals in a problem description point toward union-find. One is often enough. Two means the problem is shouting.
1. Edges or connections arriving incrementally. "Process these friendships one at a time." "Given a sequence of connections between servers." Dynamic edge arrival is the canonical union-find scenario because rebuilding a graph traversal per edge is O(V+E) per insertion, which adds up fast.
2. Repeated "are X and Y connected?" queries. If the question is connected-or-not (not "what is the path?"), union-find answers it in near-constant time. This is where BFS pays full fare on every query and union-find pays once per edge.
3. "Find the first edge that creates a cycle." Before adding edge (u, v), call find(u) and find(v). If they share a root, this edge is redundant because u and v are already connected. LeetCode 684 (Redundant Connection) is the direct form.
4. Grouping under equivalence. "Accounts with the same email belong to the same person." "All nodes with the same label should be clustered together." Equivalence classes are sets, and sets are what union-find manages natively.
5. Counting distinct components after processing all connections. Start with components = n. Each successful union decrements it by one. At the end, components holds the answer directly. No extra traversal needed.
The "Cycle Detection" Trap
Most explanations frame union-find as a cycle-detection tool. That framing is imprecise. And it causes a specific, predictable class of bugs. You will commit one of them in a contest at 11pm. It will feel correct. The judge will disagree.
Union-Find does not detect cycles. It detects same-component membership. The cycle exists because two nodes already share a root before you try to add an edge between them. You are not finding a cycle. You are asking "are these two nodes already connected?" and inferring that an edge between them would close a loop.
This distinction matters in implementation. The membership check happens before union, not after:
# Correct: check BEFORE merging for u, v in edges: if uf.find(u) == uf.find(v): return [u, v] # this edge closes a loop uf.union(u, v) # Wrong: merge first, then check for u, v in edges: uf.union(u, v) if uf.find(u) == uf.find(v): # always True after a successful union return [u, v] # fires on every edge, not just the bad one
The second version is meaningless. find(u) == find(v) is trivially true once you have merged them. You have successfully detected every edge as a cycle. Congratulations. That is not a useful thing to know.

Running your "cycle check" after union() and watching it flag every single edge.
Augmenting the DSU: Wider Than You Think
The size array does double duty. It balances the tree during unions, but it also lets you answer "what is the size of the largest connected component?" at any point. Read size[find(i)] for any node i, or maintain a running max during unions.
Augmenting the DSU with per-component data opens a much wider class of problems. Some patterns worth knowing:
- Max component size (LeetCode 827): track
sizeas above, find the largest component you can form by flipping one cell in a grid. - Weighted DSU: store each node's offset to its root. Path compression propagates offsets via addition or XOR. Useful for relative ordering, distance sums within components, or alternating-color constraints.
- Bipartite checking: store a parity bit per node representing its color relative to its root. When you union two components, set the parity so the two edge endpoints get opposite colors. If
find(u) == find(v)and both carry the same parity, there is an odd-length cycle and the graph is not bipartite. - Virtual nodes: to check whether any node in group A connects to any node in group B, create a virtual supernode for each group, then union every group member to that supernode. Checking group connectivity reduces to one
findcall per query.
Four Problems, Four Patterns
These four cover the most common ways union-find appears in interviews, from trivial to tricky.
LeetCode 547: Number of Provinces. You get an n x n adjacency matrix. Initialize DSU with n nodes. For every (i, j) where isConnected[i][j] == 1, call union(i, j). Answer is uf.components. Four lines after setup.
LeetCode 684: Redundant Connection. Process edges one by one. For each edge (u, v), check find(u) == find(v). If true, return the edge. Otherwise union(u, v). The problem guarantees exactly one extra edge, so the first check that fires gives the answer.
LeetCode 990: Satisfiability of Equality Equations. You get equations like "a==b", "b==c", "a!=d". This one teaches the two-pass pattern.
First pass: for every == equation, union the two characters.
Second pass: for every != equation, check whether the two characters share a root. If yes, the constraints are contradictory.
def equationsPossible(equations: list[str]) -> bool: uf = UnionFind(26) for eq in equations: if eq[1] == '=': uf.union(ord(eq[0]) - ord('a'), ord(eq[3]) - ord('a')) for eq in equations: if eq[1] == '!': a = ord(eq[0]) - ord('a') b = ord(eq[3]) - ord('a') if uf.find(a) == uf.find(b): return False return True
Build the equality graph completely before validating the inequalities against it. This two-pass structure shows up broadly in constraint-satisfaction problems. You cannot mix the passes and expect it to work.
LeetCode 721: Accounts Merge. Treat each email as a node. For each account, union all its emails under the first email in the list. Then group emails by root and sort each group. The non-obvious part: union-find works on integers, so map email strings to integer indices first. Build that mapping with a dict as you scan the accounts. Getting this mapping right is where most implementations fall apart.
BFS or Union-Find: A Thirty-Second Decision
Both can count connected components. The choice is quick once you ask the right question: does the problem need a path, or just yes/no connectivity?
| Situation | Reach for |
|---|---|
| All edges known upfront, one component question | BFS/DFS (simpler, same cost) |
| Edges arrive incrementally | Union-Find |
| Repeated "same component?" queries | Union-Find |
| Need the actual path between nodes | BFS/DFS |
| Shortest path | BFS (unweighted) or Dijkstra |
| Minimum spanning tree | Kruskal's (union-find at its core) |
If the problem gives you edges as a flat list and asks about connectivity rather than paths, union-find is almost certainly the intended approach.
BFS over a fully-known static graph is O(V+E), clean, no special data structure needed. Union-find over q queries across E edges is O((E+q) · α(n)), which for practical purposes is O(E+q). If q is large or edges are dynamic, the margin grows wide enough that BFS stops being a realistic option.
The connection to Kruskal's algorithm is direct: sort edges by weight, process them in order, skip any edge whose endpoints already share a root. The DSU is not a helper to Kruskal's. It is the algorithm. If you want to see that traced step by step, the Kruskal's algorithm post walks through it with examples.
The sharper version of the BFS vs union-find question is what SpaceComplexity pushes on in mock interviews: not "do you know union-find?" but "do you recognize within two minutes that this problem is asking about dynamic connectivity, not path existence?" Pattern recognition under time pressure is a different skill from knowing the template. You can memorize the implementation and still blow the interview by BFSing your way into TLE.
For a broader treatment of when to choose graph traversal strategies, BFS vs DFS covers the core decision rule.
Recap
- Initialize:
parent[i] = i,size[i] = 1,components = n. findwith path halving.unionby size, decrementcomponentson success.unionreturnsFalsewhen nodes share a root. Use that for cycle detection and component counting.- Five signals: incremental edges, repeated connectivity queries, first cycle-forming edge, equivalence grouping, component count.
- The trap: check membership before
union, not after. Checking after is always true and tells you nothing. - Augment with
size, parity, weighted offsets, or virtual nodes to reach harder problems. - Static graph, one question: BFS or DFS. Dynamic edges or repeated queries: union-find.