Union-Find vs DFS for Connectivity: One Question Decides It

You have a graph. You need to know if two nodes are connected. Your brain says DFS. It always says DFS. DFS is your graph brain's answer to everything, the same way "turn it off and back on again" is your non-graph brain's answer to computers.

That instinct is right most of the time. When it's wrong, you're rebuilding a full traversal on every edge insertion while Union-Find would have answered in constant time. That's the part your brain forgets to mention.

Both solve connectivity. The union-find vs DFS decision comes down to when they do it and how many times you plan to ask.

The One Question That Decides Everything

Is the graph being built while you query it, or are you querying a finished graph?

That's it. The entire decision tree. One branch.

Edges arrive one at a time and you need to answer "are A and B connected?" after each one? Union-Find.
You have the full graph already and need one pass to find components, paths, or island counts? DFS.

If you take nothing else from this post, take that.

DFS: One Full Sweep

DFS marks nodes visited as it recurses. Every node reachable from your starting node is in the same component. Run it from every unvisited node and you've labeled all of them.

def count_components(n: int, edges: list[list[int]]) -> int:
    graph = [[] for _ in range(n)]
    for u, v in edges:
        graph[u].append(v)
        graph[v].append(u)

    visited = [False] * n
    count = 0

    def dfs(node):
        visited[node] = True
        for neighbor in graph[node]:
            if not visited[neighbor]:
                dfs(neighbor)

    for i in range(n):
        if not visited[i]:
            dfs(i)
            count += 1

    return count

Time: O(V+E). Space: O(V+E) for the adjacency list plus O(V) call stack.

The structural limit: if you add an edge after building the graph, you rebuild everything and re-run. New edge arrives. Throw out all your work. Start over. For a static graph that's fine. For a graph that grows as you process it, that's the software equivalent of reprinting your entire spreadsheet every time you add one row.

Union-Find: Never Rebuild

Union-Find maintains a forest where each tree is one connected component. Every node starts as its own root. Add an edge: merge two trees. Query connectivity: check if they share a root.

class UnionFind:
    def __init__(self, n: int):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, x: int) -> int:
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])  # path compression
        return self.parent[x]

    def union(self, x: int, y: int) -> bool:
        px, py = self.find(x), self.find(y)
        if px == py:
            return False
        if self.rank[px] < self.rank[py]:
            px, py = py, px
        self.parent[py] = px
        if self.rank[px] == self.rank[py]:
            self.rank[px] += 1
        return True

Time per operation: O(α(n)) amortized, where α is the inverse Ackermann function. For any n that will ever appear in a real program, or in theory, or honestly in science fiction, α(n) ≤ 5. Treat it as O(1). The complexity theorists wrote a paper about it and everything.

Path compression is what makes this near-constant. The first find on a node traverses up the tree. Every subsequent call returns instantly because the parent pointer was flattened directly to the root. You paid once. You're done forever.

What the Numbers Say

	DFS	Union-Find
Build time	O(V+E)	O(n)
Single connectivity query	O(V+E)	O(α(n)) ≈ O(1)
Add an edge, then query	O(V+E) rebuild	O(α(n)) per op
Space	O(V+E)	O(n)
Find the actual path	Yes	No
Stream edges online	No	Yes
Cycle detection during build	No	Yes

One useful trick for DFS: run it once on the static graph, assign each node a component ID, then answer any "same component?" query in O(1). Works fine. But only on a frozen graph. The moment it thaws, you're reprinting the spreadsheet again.

Where Each One Wins

DFS wins: grid problems

Number of Islands is the canonical example. You have a 2D grid. You want to count connected land masses. The graph exists fully formed before you start. DFS is the natural tool.

def num_islands(grid: list[list[str]]) -> int:
    if not grid:
        return 0
    rows, cols = len(grid), len(grid[0])
    count = 0

    def dfs(r, c):
        if r < 0 or r >= rows or c < 0 or c >= cols or grid[r][c] != '1':
            return
        grid[r][c] = '0'
        dfs(r+1, c); dfs(r-1, c); dfs(r, c+1); dfs(r, c-1)

    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == '1':
                dfs(r, c)
                count += 1
    return count

You could use Union-Find here by mapping 2D coordinates to 1D indices. It would work. It would also be longer, harder to read, and would impress exactly no one. That's the second failure mode: over-engineering because you just learned a new tool and it's excited to be used.

DFS wins: path reconstruction

Union-Find can tell you if A and B are connected. It cannot tell you how to get from A to B.

The parent array in Union-Find tracks tree roots for merging, not traversal paths. If you need the actual route, use DFS or BFS. Union-Find threw away the directions the moment it decided which root to keep.

Union-Find wins: online edge insertion

Redundant Connection (LeetCode 684) gives you edges one at a time and asks which one first creates a cycle. With DFS you'd re-run traversal for each edge. Union-Find answers it in five lines.

def find_redundant_connection(edges: list[list[int]]) -> list[int]:
    uf = UnionFind(len(edges) + 1)
    for u, v in edges:
        if not uf.union(u, v):
            return [u, v]
    return []

The DFS version runs cycle detection on each incremental addition: O(E * (V+E)) total vs O(E * α(V)). For a hundred edges, that difference is the gap between "fast" and "maybe I should get a coffee while this finishes."

Union-Find wins: satisfiability constraints

LeetCode 990 gives you constraints like a==b, b==c, c!=a and asks you to detect contradiction. Process all equality constraints first with Union-Find, then check inequality constraints against the merged groups. DFS is awkward here because the graph is implicit and you'd need to construct it first anyway.

Union-Find wins: Kruskal's MST

Kruskal's sorts edges by weight and adds them greedily when they don't create a cycle. The cycle check is exactly Union-Find's whole job.

Sort edges by weight.
For each edge (u, v, w):
    if find(u) != find(v):    ← O(α(n))
        add to MST
        union(u, v)

The whole algorithm is O(E log E) dominated by sorting. Union-Find is so fast here it barely shows up in the analysis.

What They Actually Look Like

After union(1,2), union(2,3), union(4,5), union(2,4):

Union-Find forest vs DFS graph: parent pointers point to shared root vs edges stored explicitly

Union-Find parent array:

node:    1   2   3   4   5
parent:  2   2   2   2   4
rank:    0   2   0   1   0

DFS graph for the same edges:

1 -- 2 -- 3
     |
     4 -- 5

The DFS graph has edges. Union-Find has only parent pointers. That's why DFS can traverse paths and Union-Find can't: it discarded the edge information the moment it chose which root to keep. Querying find(1) and find(5) both trace to node 2. Same component. Next call to find(1) sets parent[1] = 2 directly, skipping intermediate nodes. It remembered the answer and will never walk that path again.

The Signal You're Sending

Reaching for DFS on a connectivity problem where edges arrive online signals pattern-matching without reading the constraint. The interviewer will nudge you toward Union-Find, and how you respond to that nudge is itself a scored signal. "Oh right, the graph isn't static" is what you want to say. Not "no no, my DFS approach works, let me just re-run it after every edge" while the interviewer checks their watch.

Reaching for Union-Find on a grid problem where the graph is static suggests you've memorized it as "the connectivity algorithm" without asking whether you actually need dynamic updates. Also a signal. Arguably a worse one, because you made your code three times longer for no reason and the interviewer now has to watch you finish implementing it.

Ask yourself before writing any code: "Do I need to answer connectivity queries as edges are being added, or do I have the full graph upfront?" That single question routes you correctly every time.

If you want to practice this under interview pressure, SpaceComplexity runs voice-based mock interviews with rubric-based feedback, including whether you're picking the right algorithm before you start coding.

Where This Goes Wrong

Using DFS when edges arrive incrementally. You rebuild on every edge. For n edges: O(n * (V+E)) when Union-Find would be O(n * α(n)). You'll probably notice something is slow. You might not notice why until it's too late to fix.

Using Union-Find when you need the path. You'll realize mid-implementation that your parent array doesn't store traversal order. Then you backtrack to DFS. The interviewer has now watched you implement the wrong data structure for five minutes. The clock does not pause for this.

Forgetting path compression. Without it, Union-Find degrades to O(log n) with union by rank alone, or O(n) with neither optimization. Both together get you to O(α(n)). Skip path compression and you've written a slower version of DFS with worse code. Congratulations.

Off-by-one in initialization. If nodes are 1-indexed (common in LeetCode graph problems), initialize parent = list(range(n+1)). Missing index n causes a bounds error on the last node. It always does. It will happen to you too.