What Is a Connected Component? The Graph Concept Behind Every Island Problem

June 4, 20269 min read
dsaalgorithmsinterview-prepgraphs
What Is a Connected Component? The Graph Concept Behind Every Island Problem
TL;DR
  • Connected components are maximal sets of mutually reachable nodes in an undirected graph; an isolated node still counts as a component of size one.
  • DFS and BFS both find every component in O(V + E) time; use DFS for simplicity, BFS when recursion depth is a concern on large graphs.
  • Union-Find with path compression and union-by-rank runs near O(E) and wins when edges arrive incrementally or you need cycle detection as a byproduct.
  • The visited-before-enqueue rule prevents duplicate processing; marking on dequeue instead can inflate your BFS queue from O(V) to O(V²).
  • Grid problems like Number of Islands are connected component problems with implicit adjacency—the same DFS skeleton applies with boundary and cell-value checks.
  • Directed graphs split into weakly connected components (ignore direction) and strongly connected components (respect direction); most interview problems are undirected.

That "number of islands" problem you've seen a hundred times? You're counting connected components. Most engineers solve it without knowing that's what they're doing, which means they don't recognize the same pattern when it shows up wearing a different costume: provinces, friend circles, accounts to merge, clusters to detect.

You've been implementing this algorithm for years. Let's give it a name.

Reachable Means Binary

A connected component is a maximal set of nodes in an undirected graph where every node is reachable from every other node by following edges.

Two words matter there. "Reachable" means you can get from A to B by walking edges in any direction. "Maximal" means you can't add another node without breaking the reachability property. No stragglers included, no reachable nodes excluded.

A concrete picture:

1 - 2 - 3    4 - 5    6

Three components: {1, 2, 3}, {4, 5}, and {6}. Node 6 has no edges, but a single isolated node is still a component. Size one, but a component.

Now add one edge between 3 and 4:

1 - 2 - 3 - 4 - 5    6

Two components: {1, 2, 3, 4, 5} and {6}. The moment you connect two components, they merge. You're not measuring closeness or some fuzzy affinity score. You're asking a binary question: can I get from here to there?

One DFS Pass, One Component Counted

The standard algorithm loops over every node. For each unvisited node, it starts a DFS and marks everything reachable from that node. When the DFS returns, you've traced one complete component. Increment a counter. Keep going.

def count_components(n: int, edges: list[list[int]]) -> int: graph: list[list[int]] = [[] for _ in range(n)] for u, v in edges: graph[u].append(v) graph[v].append(u) visited: set[int] = set() def dfs(node: int) -> None: for neighbor in graph[node]: if neighbor not in visited: visited.add(neighbor) dfs(neighbor) count = 0 for node in range(n): if node not in visited: visited.add(node) dfs(node) count += 1 return count

The outer loop guarantees you hit every node exactly once. The inner DFS exhausts the component before returning.

Mark the node as visited before the DFS call, not inside it. If you mark inside, multiple neighbors can queue the same node before any of them marks it, which corrupts both your visited set and your count. Ask me how I know.

Algorithm - when programmers don't want to explain what they did. Heuristic - when programmers can't explain what they did. Machine Learning - when programmers don't know what they did.

Number of Islands: when you implemented a connected components algorithm and didn't know what you did.

BFS: For When You Don't Trust Your Stack

Some engineers prefer BFS to avoid recursion depth issues. On a path graph with a million nodes, the DFS above would blow Python's call stack. BFS uses an explicit queue on the heap instead, which has the same logical behavior but a much bigger budget:

from collections import deque def count_components_bfs(n: int, edges: list[list[int]]) -> int: graph: list[list[int]] = [[] for _ in range(n)] for u, v in edges: graph[u].append(v) graph[v].append(u) visited: set[int] = set() count = 0 for start in range(n): if start in visited: continue queue: deque[int] = deque([start]) visited.add(start) while queue: node = queue.popleft() for neighbor in graph[node]: if neighbor not in visited: visited.add(neighbor) queue.append(neighbor) count += 1 return count

Same O(V + E) complexity, different stack behavior. The visited-before-enqueue rule matters here too. If you mark on dequeue instead of enqueue, the same node gets pushed multiple times and your queue can grow from O(V) to O(V²). One small oversight, quadratic surprise.

See depth-first search and breadth-first search for the full treatment of each traversal.

When Union-Find Wins

When edges arrive incrementally, or when you need to detect cycles as a side effect, Union-Find often wins. The name sounds intimidating. The idea isn't. Every node starts as its own component. Each edge call merges two components. Count the merges, track what's left.

class UnionFind: def __init__(self, n: int): self.parent = list(range(n)) self.rank = [0] * n self.components = n def find(self, x: int) -> int: if self.parent[x] != x: self.parent[x] = self.find(self.parent[x]) return self.parent[x] def union(self, x: int, y: int) -> bool: px, py = self.find(x), self.find(y) if px == py: return False if self.rank[px] < self.rank[py]: px, py = py, px self.parent[py] = px if self.rank[px] == self.rank[py]: self.rank[px] += 1 self.components -= 1 return True def count_components_uf(n: int, edges: list[list[int]]) -> int: uf = UnionFind(n) for u, v in edges: uf.union(u, v) return uf.components

The union() returns False when two nodes already share the same root. That's your cycle detector: if adding an edge returns False, it's a redundant connection. The same data structure that counts components also solves LeetCode 684 (Redundant Connection) without any extra code. See when to use Union-Find for the decision framework.

Directed Graphs Ruin Everything (Briefly)

Everything above assumes undirected edges. For directed graphs, the concept splits, and you'll need to know which version the problem is asking for.

A weakly connected component ignores edge direction and applies the same DFS/BFS algorithm.

A strongly connected component (SCC) requires that every node can reach every other node following edge directions. Given A -> B -> C with no return edges, all three nodes form one weak component but three SCCs. A can reach B and C, B can reach C, but C can reach neither A nor B.

Finding SCCs requires Kosaraju's algorithm or Tarjan's algorithm, both O(V + E). In most interview problems you'll be on undirected graphs. If the problem says "directed," stop and clarify whether they want weak or strong connectivity. That one question saves you from implementing the wrong algorithm entirely.

They're All O(V + E). Pick One.

ApproachTimeSpace
DFSO(V + E)O(V)
BFSO(V + E)O(V)
Union-Find (path compression + rank)O(E · α(V)) ≈ O(E)O(V)

V is vertices, E is edges. The O(V + E) bound comes from visiting every node once and traversing every edge at most twice (once from each endpoint in an undirected graph).

Union-Find's α(V) is the inverse Ackermann function. For any input you'll see in an interview, treat it as O(1) per operation. The practical difference between DFS and Union-Find is small. Pick whichever fits the problem's shape: DFS for static graphs you traverse once, Union-Find for dynamic graphs where edges arrive over time.

Space is O(V) for all three. DFS stacks O(h) frames where h is the DFS tree height, worst-case O(V) on a path graph. BFS and Union-Find use O(V) explicitly.

Grid Problems Are Component Problems

The grid DFS pattern in "number of islands" isn't a separate algorithm. It's the same component-finding DFS with an implicit adjacency structure. Instead of an adjacency list, your neighbors are the four (or eight) adjacent cells that pass a condition.

def num_islands(grid: list[list[str]]) -> int: if not grid: return 0 rows, cols = len(grid), len(grid[0]) visited: set[tuple[int, int]] = set() def dfs(r: int, c: int) -> None: if (r < 0 or r >= rows or c < 0 or c >= cols or grid[r][c] == '0' or (r, c) in visited): return visited.add((r, c)) dfs(r + 1, c) dfs(r - 1, c) dfs(r, c + 1) dfs(r, c - 1) count = 0 for r in range(rows): for c in range(cols): if grid[r][c] == '1' and (r, c) not in visited: dfs(r, c) count += 1 return count

The moment you can model a grid problem as "nodes with adjacency conditions," connected components is your framework. Rotten Oranges, Pacific Atlantic Water Flow, Flood Fill, Surrounded Regions. All the same skeleton.

How do functions break up? They stop calling each other!

Every island on that grid is a clique of nodes that stopped calling their neighbors.

The Problems You've Already Solved (You Just Didn't Know)

LeetCode 200 - Number of Islands. Count components in a grid where '1' cells are nodes.

LeetCode 547 - Number of Provinces. An n×n adjacency matrix. Count components.

LeetCode 684 - Redundant Connection. Find the edge that creates a cycle. Union-Find: the first edge whose union() returns False is the answer.

LeetCode 721 - Accounts Merge. Accounts with shared emails need merging. Build a graph where emails are nodes and edges connect emails in the same account. Find components. Each component is one merged account.

LeetCode 1319 - Number of Operations to Make Network Connected. Union-Find lets you count both connected components and redundant edges simultaneously.

The signal to look for: "how many groups," "how many clusters," "are these two nodes connected," "merge all X that share Y." Any of those phrasings points at component counting.

For deeper practice on recognizing the shape before you code, DFS pattern recognition and BFS pattern recognition map out the full decision tree.

Nail These Two Questions Before You Touch the Keyboard

In an interview, the first thing to nail down is undirected versus directed. It changes the algorithm class entirely. A quick "are these edges directed?" buys you the right framework before you write a line of code.

The second question: do you need to count components, label which component each node belongs to, find the largest component, or check connectivity between two specific nodes? All of these use the same DFS/BFS loop, but what you return at the end differs. Clarify the output before you optimize.

If you want to practice explaining your approach out loud the way a real interview demands, SpaceComplexity runs voice-based mock interviews with rubric feedback on exactly this kind of graph reasoning. Try a session before your next onsite.

Further Reading