Top 14 Union-Find Problems for Coding Interviews, Ranked

May 29, 202610 min read
dsaalgorithmsinterview-prepleetcode
TL;DR
  • Union-Find cycle detection always fires when union() returns False — both nodes already share a root
  • Path compression + union by rank reduces all operations to O(α(n)), effectively constant for any practical input
  • Weighted DSU (LC 399) stores a multiplicative factor per node so find() computes the product along the compressed path
  • Virtual proxy nodes (LC 952) avoid O(n²) edge lists by routing numbers through their prime factors as shared intermediaries
  • Component size arithmetic (LC 2316) lets you count unreachable pairs with size × (n − size) summed across all roots
  • Symbolic union (LC 990) requires processing every equality edge before checking any inequality — order is non-negotiable
  • Directed graph DSU (LC 685) combines in-degree tracking with cycle detection and must handle three structurally distinct cases

You recognize union-find problems. You write the template. You feel quietly confident. Then the problem asks you to return the minimum ratio between two variables, and you stare at the parent array like it owes you money. There are five meaningfully different Union-Find variants, and most prep material only shows you the first one.

This guide covers 14 problems in order of increasing difficulty, grouped by the technique each one introduces. After working through them in sequence, you will have mapped every Union-Find pattern that appears in coding interviews: basic components, cycle detection, symbolic grouping, weighted edges, virtual nodes, and directed graphs.


Start Here: The Template You Will Copy-Paste Every Time

Every problem below starts from this template. Memorize it once and thank yourself later.

class DSU: def __init__(self, n): self.parent = list(range(n)) self.rank = [0] * n self.size = [1] * n self.components = n def find(self, x): if self.parent[x] != x: self.parent[x] = self.find(self.parent[x]) return self.parent[x] def union(self, x, y): px, py = self.find(x), self.find(y) if px == py: return False if self.rank[px] < self.rank[py]: px, py = py, px self.parent[py] = px self.size[px] += self.size[py] if self.rank[px] == self.rank[py]: self.rank[px] += 1 self.components -= 1 return True

union returns False when both endpoints are already connected. self.size[find(x)] gives the component size of any node. self.components counts how many groups remain. You will use at least one of those three in every problem below.


Tier 1: Pure Connectivity (Problems 1-5)

Five problems, one core question: how many groups exist, and does this edge create a cycle? You can basically solve all five after understanding the first one. The rest are just different packaging on the same idea.

1. LC 547, Number of Provinces (Medium)

The canonical entry point. You get an n x n adjacency matrix and must count connected components. Iterate the upper triangle, call union(i, j) on every isConnected[i][j] == 1, return dsu.components. This is the shape every Union-Find solution starts from: one union call per edge, one answer via component count.

Iterating the full matrix also works since redundant unions are no-ops. Doing it without comment in an interview signals you didn't think about it.

2. LC 200, Number of Islands (Medium)

Grid connectivity. Flatten 2D coordinates to a 1D index: i * cols + j. Initialize components to count only land cells. For each land cell, union with neighbors to the right and below. The lesson is coordinate mapping: 2D grids are just 1D arrays with a different address scheme.

BFS and DFS are simpler here. Union-Find earns its place when islands are added dynamically, which is a separate problem pattern.

3. LC 684, Redundant Connection (Medium)

A tree with one extra edge. Process edges in order. The first edge where union(u, v) returns False is the answer because both endpoints already belong to the same component, meaning this edge completes a cycle.

Common mistake: returning the edge index rather than the edge [u, v] itself. Classic "I had the right idea" rejection.

4. LC 990, Satisfiability of Equality Equations (Medium)

Characters as nodes. Two passes: first union every == pair, then verify no != pair shares a root. Process all equalities before checking any inequality. Interleaving the two passes can reject a valid pair before its transitive chain is fully built. Don't be clever with the pass order. It won't work.

5. LC 1319, Number of Operations to Make Network Connected (Medium)

Given n computers and a list of cables, each redundant cable (one where union returns False) can be relocated. You need exactly dsu.components - 1 relocations to connect everything. If redundant cables are fewer than dsu.components - 1, return -1. Otherwise return dsu.components - 1. Component count directly encodes the answer.


Tier 2: Creative Applications (Problems 6-10)

These problems use Union-Find as an engine inside a larger algorithm. The DSU template barely changes. The surrounding logic gets interesting. You will read several of these problem statements and think "wait, this isn't a Union-Find problem." Then it is.

6. LC 1202, Smallest String With Swaps (Medium)

Union all swappable index pairs. Within each component, you can freely rearrange characters, so sort them and place them back in their positions in sorted order. Swap transitivity means a long chain of swaps collapses into one group. Group indices by find(i), collect the characters, sort, write back.

7. LC 721, Accounts Merge (Medium)

Two accounts belong to the same person if they share any email. Map each email to an account index, then union each account with the prior owner of every shared email. Group all emails by root, sort each group, prepend the name. The lesson: Union-Find works on arbitrary objects by routing them through a hash map to integer IDs. Once you see this pattern, a whole class of "merge these groups" problems opens up.

8. LC 778, Swim in Rising Water (Hard)

Cell values represent time. Sort all cells by value. Process them in ascending order, unioning each newly available cell with its already-processed neighbors. Stop the moment (0,0) and (n-1,n-1) are in the same component. The answer is the value of the cell that triggers that union.

This "sort events, then union" structure appears repeatedly. Kruskal's MST is built the same way.

9. LC 1584, Min Cost to Connect All Points (Medium)

Classic Kruskal's MST on a complete graph. Generate all O(n²) edges with Manhattan distances, sort by cost, process with Union-Find. Sum only edges where union returns True. Stop when components == 1. The only departure from textbook Kruskal's is generating the edge list from coordinates rather than reading it.

See Kruskal's Algorithm for the full correctness proof via the cut property.

10. LC 2316, Count Unreachable Pairs (Medium)

After Union-Find, for each component of size s, every node inside it cannot reach the n - s nodes outside it. Sum s * (n - s) across all components and divide by two. Component size, not just component count, drives the answer.


Tier 3: Hard Extensions (Problems 11-14)

Each of these modifies the DSU template in a structural way. Study them separately before combining. Problem 11 in particular has humbled many engineers who thought they had Union-Find figured out after Tier 1.

11. LC 399, Evaluate Division (Medium, weighted variant)

Variables as nodes, division ratios as weighted edges. find(x) must also return the multiplicative factor from x to its root. When you union two components, compute weight[px] = weight[y] / weight[x] * weight[py] to maintain ratio consistency. Path compression propagates the running product at each step. This is weighted Union-Find: every node stores a relative value, and find computes the product along the path.

This is the hardest conceptual leap in the list. The basic template tracks membership. Weighted DSU tracks membership and a numerical relationship simultaneously. Understand it once and all weighted DSU variants follow. But it takes real effort to get it the first time, and that effort is worth paying.

12. LC 839, Similar String Groups (Hard)

Two strings are similar if they differ in exactly 0 or 2 positions. Compare every pair in O(n² × k). Union similar pairs, return the number of components. The lesson: sometimes the union trigger is a custom O(k) similarity check rather than a simple edge lookup. For large n and k, this approaches the time limit and forces careful constant-factor optimization.

13. LC 952, Largest Component Size by Common Factor (Hard)

Unioning numbers directly would require O(n²) comparisons. Instead, union each number with all its prime factors as virtual proxy nodes. Factor each number in O(√val), union num with each factor, then find the largest component among the original numbers only. Virtual nodes express shared structure without a quadratic edge list.

This trick, adding intermediary proxy nodes to avoid a dense direct graph, appears in other Union-Find hard problems too. Worth internalizing.

14. LC 685, Redundant Connection II (Hard)

A directed graph that almost forms a valid rooted tree. The extra edge causes a cycle, a node with two parents, or both. Three cases: an in-degree-2 node with no cycle (remove the later candidate edge), a cycle with no in-degree-2 node (remove the cycle edge), or both (remove the candidate that participates in the cycle). Union-Find handles the cycle check. The in-degree constraint narrows which edges are candidates. Most solutions that look correct fail on the third case. Enumerate all three explicitly.


Union-Find Problems: Quick Reference

#ProblemDifficultyKey Technique
1547 Number of ProvincesMediumConnected components
2200 Number of IslandsMediumGrid flattening
3684 Redundant ConnectionMediumCycle detection
4990 Equality EquationsMediumSymbolic union
51319 Network ConnectedMediumComponent count shortfall
61202 Smallest String SwapsMediumSwap transitivity
7721 Accounts MergeMediumMap-based identity merge
8778 Swim in Rising WaterHardSort events then union
91584 Min Cost Connect PointsMediumKruskal's MST
102316 Unreachable PairsMediumComponent size arithmetic
11399 Evaluate DivisionMediumWeighted DSU
12839 Similar String GroupsHardCustom union trigger
13952 Largest Component FactorHardVirtual proxy nodes
14685 Redundant Connection IIHardDirected cycle + in-degree

What the Progression Teaches

  • Problems 1-5 share one idea. If you can solve 547, you can solve all five with small adjustments.
  • Problems 6-10 wrap DSU inside a broader algorithm. The data structure does not change.
  • Problems 11-14 each modify DSU in a structural way: weights, custom triggers, virtual nodes, directed edges.
  • The cycle detection signal is always: union returns False.
  • The component size signal is always: dsu.size[dsu.find(x)].
  • Weighted DSU (399) is the most mechanically distinct. Treat it as a separate subpattern.
  • Virtual nodes (952) are the most creative extension. When direct edges would be O(n²), add a shared proxy.

For a deep dive into path compression and union by rank, see Union-Find Algorithm: Two Arrays That Make Connectivity Nearly Free. For the five signals that tell you Union-Find is the right choice before you start coding, see You're Running BFS. The Problem Wanted Union-Find..


Turn the List Into Reps

A curated problem list gets you through the patterns. Narrating your approach under interview pressure is a separate skill. SpaceComplexity runs voice-based DSA mock interviews with rubric feedback on exactly the dimensions Union-Find tests: problem recognition, approach clarity, complexity analysis, and edge case handling. Once you have worked through this list, a few rounds of timed voice practice will confirm that the articulation is as clean as the code.


Further Reading