What Is a Strongly Connected Component? The Interview Guide
- Strongly connected component (SCC): maximal group in a directed graph where every node can reach every other node via directed edges
- Condensation collapses each SCC into a super-node, always producing a DAG — unlocking topological sort on graphs that had cycles
- Kosaraju's algorithm uses two DFS passes plus an explicit transposed graph: O(V+E) time, O(V+E) space
- Tarjan's algorithm uses a single DFS pass with disc and low-link arrays: O(V+E) time, O(V) space — more cache-friendly on large inputs
- Cycle detection in directed graphs reduces to checking whether any SCC contains more than one node
- 2-SAT and dependency resolution with cycles are the two classic applied SCC problems beyond basic cycle detection
Most graph problems you see in interviews involve undirected graphs. Nodes connect, paths go both ways, and "connected" has an obvious meaning. Directed graphs are different. Add direction, and a path from A to B tells you nothing about whether you can get back.
That asymmetry is where strongly connected components come in. It's also where a lot of candidates mentally check out and just hope the interviewer picks something else.
A strongly connected component (SCC) is a maximal group of nodes in a directed graph where every node can reach every other node by following directed edges. Every node inside the component has a route to every other node inside it. No exceptions.
Why "Connected Component" Doesn't Cut It for Directed Graphs
With undirected graphs, a connected component is simple: keep following edges and see how far you can get. Edge between A and B means A can reach B and B can reach A. Everyone is happy.
Directed edges break that symmetry. Node A might have an edge to B, but B might have no edge back to A. You can reach B from A, but you cannot return. You've walked into a room with no door handle on the inside.
That's why the definition of SCC requires reachability in both directions. For nodes u and v to belong to the same SCC, two paths must exist: one from u to v, and one from v to u. If either direction is missing, they're in different SCCs.
Even a single isolated node with no edges is its own SCC. It can trivially reach itself. So every node in a directed graph belongs to exactly one SCC, even the loners.
Two Cycles, Two SCCs
Take this directed graph with six nodes:
Edges: 0→1, 1→2, 2→0, 2→3, 3→4, 4→3
Start with nodes 0, 1, and 2. You can reach 1 from 0 (directly), 2 from 1, and 0 from 2. The cycle closes, so all three can reach each other. First SCC: {0, 1, 2}.
Now look at nodes 3 and 4. Edge 3→4 and edge 4→3 form a two-node cycle. Both can reach each other. Second SCC: {3, 4}.
Notice that node 3 is reachable from node 2 (path: 2→3), but there is no path from 3 back to 2. You can check in to the {3, 4} group from the {0, 1, 2} group. You cannot check out. That one-way door means the two groups are separate SCCs.
That's the "maximal" part of the definition. {0, 1, 2, 3} is not an SCC because you can't get from 3 back to 0. Maximal means you've found the largest possible group where mutual reachability holds.
The Condensation: Every Directed Graph Secretly Wants to Be a DAG
Here's a property that feels surprising the first time. Take any directed graph and collapse each SCC into a single super-node. Connect two super-nodes if there was any edge between the original groups.
The resulting graph is always a directed acyclic graph (DAG).
It has to be. If the condensation had a cycle among super-nodes, then the nodes inside those super-nodes could all reach each other, which means they should have been one SCC in the first place. That would contradict your SCC definitions. So a cycle in the condensation is impossible.
For the example above, the condensation has two nodes: one for {0, 1, 2} and one for {3, 4}. One directed edge between them. A clean two-node DAG.
This matters because DAGs have topological orderings, and topological ordering unlocks a large family of algorithms: processing dependencies in order, finding critical paths, dynamic programming on graphs. The SCC condensation is the standard way to take a cyclic directed graph and reduce it to a form where those algorithms apply.
Two Algorithms, Both O(V+E), One Obvious Trade-off
Kosaraju's Algorithm (Two DFS Passes)
- Run DFS on the original graph. As each node finishes (all its neighbors explored), push it onto a stack.
- Build the transpose graph: flip every edge direction.
- Pop nodes from the stack one by one. For each unvisited node, run DFS on the transpose graph. Every node you visit in this DFS belongs to the same SCC.
The intuition: the first DFS identifies which nodes "finish last" in a way that respects SCC boundaries. Running DFS on the reversed graph from those nodes in reverse-finish order walks exactly within one SCC at a time.
Yes, it does DFS twice. Yes, it builds a whole transposed graph. This is the algorithm equivalent of reading a document, then reading it again backwards to make sure you understood it. It works.
Space cost: O(V+E), because you need to store the transposed graph explicitly.
See Kosaraju's algorithm for the full walkthrough and proof.
Tarjan's Algorithm (One DFS Pass)
Tarjan's uses a single DFS with two arrays and a stack. For each node it tracks disc (discovery time) and low (the earliest discovery time reachable from the node's subtree). Whenever low[u] == disc[u], node u is the root of an SCC. Everything on the stack above u belongs to that SCC.
Space cost: O(V) extra space (no transposed graph required). In practice, Tarjan's is more cache-friendly and slightly faster on large inputs.
See Tarjan's algorithm for the low-link proof and implementation.
Both algorithms are standard knowledge for any graph-heavy interview loop. Tarjan's is slightly trickier to implement from memory, so most people default to Kosaraju's when given the choice. More DFS, fewer mental gymnastics.
Here's Kosaraju's in Python, enough to reconstruct it under pressure:
from collections import defaultdict def kosaraju(n: int, edges: list[tuple[int, int]]) -> list[list[int]]: graph = defaultdict(list) transposed = defaultdict(list) for u, v in edges: graph[u].append(v) transposed[v].append(u) visited = [False] * n finish_order = [] def dfs1(node): visited[node] = True for neighbor in graph[node]: if not visited[neighbor]: dfs1(neighbor) finish_order.append(node) for i in range(n): if not visited[i]: dfs1(i) visited = [False] * n sccs = [] def dfs2(node, component): visited[node] = True component.append(node) for neighbor in transposed[node]: if not visited[neighbor]: dfs2(neighbor, component) while finish_order: node = finish_order.pop() if not visited[node]: component = [] dfs2(node, component) sccs.append(component) return sccs
For large graphs (tens of thousands of nodes), convert both DFS functions to iterative to avoid hitting Python's recursion limit. Python has strong opinions about how deep you're allowed to go.
The Complexity Comparison
| Algorithm | Time | Space |
|---|---|---|
| Kosaraju's | O(V+E) | O(V+E) |
| Tarjan's | O(V+E) | O(V) |
Both visit each node and edge a constant number of times, giving O(V+E) time. The space difference comes from Kosaraju's need to store the transposed graph explicitly. Tarjan's only carries the recursion stack plus three O(V) arrays. If an interviewer asks about space trade-offs, Tarjan's is the cleaner answer.
The Interview Won't Say "SCC." Here's When You'll Need It Anyway.
Direct SCC questions are rare in standard software engineering interviews but common in competitive-programming-flavored loops (Jane Street, Citadel, HRT). The more common scenario: you need SCC thinking without the problem mentioning SCCs at all.
Cycle detection in directed graphs. If you need to know whether a directed graph contains a cycle, running SCC detection and checking if any SCC has more than one node (or a self-loop) answers the question. This is related to course scheduling problems and dependency resolution.
Finding condensation to enable topological sort. Some problems give you a directed graph with cycles and ask you to process nodes in dependency order. You can't topological-sort a cyclic graph directly. The trick: compute SCCs, condense the graph to a DAG, then topological-sort the condensation. Each super-node gets processed as a unit. See topological sort for why this matters.
Reachability across a directed graph. If a problem asks which nodes are mutually reachable, or which parts of a system always communicate back and forth, that's SCC language even when the problem doesn't say so.
2-SAT. This is the most elegant SCC application, though it shows up mainly in competitive programming. Any 2-SAT problem (decide if a boolean formula with two-variable clauses is satisfiable) reduces to finding SCCs on an implication graph. If a variable and its negation land in the same SCC, the formula is unsatisfiable. It's a beautiful reduction that makes you feel briefly smart.
The practical move for most interview contexts: recognize directed-graph cycle questions, know both SCC algorithms cold, understand that condensation produces a DAG, and reach for topological ordering once you have the condensation.
If you want to practice verbalizing this reasoning under time pressure, SpaceComplexity runs voice-based mock interviews that include graph algorithm problems where SCC knowledge comes up in follow-up questions.