Tarjan's Algorithm: One DFS Pass to Find All Strongly Connected Components

May 27, 202630 min read
dsaalgorithmsinterview-prepgraphs
Tarjan's Algorithm: One DFS Pass to Find All Strongly Connected Components
TL;DR
  • Tarjan's algorithm decomposes any directed graph into strongly connected components in a single DFS pass: O(V+E) time, O(V) space.
  • Four structures drive it: disc (discovery time), low (minimum disc reachable via back edges), onStack (O(1) stack membership), and the DFS stack itself.
  • A vertex v is an SCC root when disc[v] == low[v]; pop the stack down to v and that collection is exactly one SCC.
  • The onStack check is non-optional: without it, cross edges to finished SCCs silently corrupt low-link values and merge separate SCCs.
  • SCCs come out in reverse topological order of the condensation DAG, so ordering is free with no second pass.
  • Reduces directly to 2-SAT: build an implication graph, run SCC, check whether any variable shares an SCC with its negation.
  • Beats Kosaraju's algorithm by avoiding graph reversal and the entire second DFS pass.

Two DFS passes felt like the minimum. Kosaraju's algorithm built on that assumption: first pass to record finish times, second pass on the reversed graph. Clean enough. Two passes. Done. Then Tarjan looked at the same problem in 1972 and decided one was enough.

He was right. One DFS. Every cycle cluster found. No graph reversal. The algorithm, published in SIAM Journal on Computing that year, is still the standard tool for decomposing directed graphs into strongly connected components fifty years later.

A strongly connected component is a maximal set of vertices where every vertex can reach every other. Think of each SCC as a "black hole" region of the graph: once you enter, you can visit every other node in it, but you cannot escape to any earlier region. The condensation of those components into single nodes always forms a DAG. That DAG is the skeleton of the graph's dependency structure.

You reach for Tarjan's algorithm when you need to find that skeleton: detect cycles in a directed graph, solve 2-SAT in linear time, identify deadlocked processes, or preprocess a graph for dynamic programming that would otherwise be blocked by cycles.

What Is a Strongly Connected Component?

Take a directed graph. Vertices u and v are in the same SCC if there is a directed path from u to v and a directed path from v to u. An SCC is the maximal such group: you cannot grow it further without breaking the mutual reachability.

     0 ──► 1
     ▲     │
     │     ▼
     3 ◄── 2     4 ──► 5
                 ▲     │
                 └─────┘

In this graph, vertices 0, 1, 2, 3 all reach each other (0→1→2→3→0), so they form one SCC. Vertex 4 and vertex 5 form a second SCC (4→5→4). The edge from vertex 3 to vertex 4 connects these two SCCs but creates no cycle between them.

Directed graph showing SCC 1 containing vertices 0, 1, 2, 3 highlighted in blue with a four-cycle, and SCC 2 containing vertices 4 and 5 highlighted in amber with a mutual cycle. A one-way bridge edge runs from vertex 3 to vertex 4. Two SCCs: the blue group forms the cycle 0→1→2→3→0, the amber group cycles 4→5→4. The bridge from 3 to 4 crosses into SCC 2 but cannot return.

Contracting each SCC to a single node always produces a directed acyclic graph. That DAG is called the condensation. Tarjan's algorithm finds all SCCs and, as a free bonus, outputs them in reverse topological order of the condensation.

A four-panel webcomic: two engineers discuss circular dependencies, one draws five modules (A through E) connected in a pentagram of arrows all importing each other, and the punchline is "stand in the center while I blow the Horn of Abraxas" Every SCC with more than one vertex looks exactly like this. Tarjan's algorithm finds all of them in one linear pass. No horns required.

Four Structures Drive the Algorithm

The algorithm runs a single DFS and tracks four things: two integer arrays, one boolean array, and one explicit stack.

StructureTypePurpose
disc[v]integerDiscovery time when v was first visited
low[v]integerMinimum disc value reachable from v's subtree
stackstack of verticesActive vertices not yet assigned to an SCC
onStack[v]booleanO(1) membership check for the stack

The discovery time is a global counter that increments once per vertex. The low-link value is the subtle one.

What Low-Link Actually Means

low[v] is the minimum discovery time reachable from v's subtree, counting only tree edges and back edges to vertices still on the stack. Edges to vertices already popped from the stack (vertices already assigned to a finished SCC) do not count.

In other words: how far back in time can I reach from here? If low[v] is less than disc[v], there is an ancestor above you that your subtree can still touch. You are not the root of anything. But if low[v] == disc[v], you are as far back as you can go. Nothing below you escapes to earlier ancestry. That is your SCC boundary.

This restriction is what makes the algorithm correct. Without it, a cross edge from one SCC to an already-finished SCC would incorrectly pull down the low-link value, merging two separate SCCs into one.

How Low-Link Values Are Updated

When the DFS visits vertex v:

  1. Set disc[v] = low[v] = timer++, push v onto the stack, set onStack[v] = true.
  2. For each neighbor w:
    • If w is unvisited (tree edge): recurse into w, then set low[v] = min(low[v], low[w]).
    • If w is visited and on the stack (back edge into current SCC): set low[v] = min(low[v], disc[w]).
    • If w is visited and not on the stack (cross edge to a finished SCC): ignore.
  3. After processing all neighbors: if disc[v] == low[v], v is the root of an SCC. Pop everything from the stack down to v (inclusive). That collection of vertices is one SCC.
Pseudocode
──────────
timer ← 0
disc[v] ← low[v] ← −1 for all v

function dfs(v):
    disc[v] ← low[v] ← timer++
    push v onto stack; onStack[v] ← true

    for each neighbor w of v:
        if disc[w] == −1:
            dfs(w)
            low[v] ← min(low[v], low[w])
        else if onStack[w]:
            low[v] ← min(low[v], disc[w])

    if disc[v] == low[v]:
        scc ← []
        repeat:
            node ← pop from stack
            onStack[node] ← false
            scc.append(node)
        until node == v
        output scc

for each vertex v:
    if disc[v] == −1: dfs(v)

Walking Through the Trace

Let's trace the algorithm on the graph from earlier: edges 0→1, 1→2, 2→3, 3→0, 3→4, 4→5, 5→4.

DFS starts at vertex 0
────────────────────────────────────────────────────────────
Visit 0: disc=0, low=0   stack=[0]
  Visit 1: disc=1, low=1   stack=[0,1]
    Visit 2: disc=2, low=2   stack=[0,1,2]
      Visit 3: disc=3, low=3   stack=[0,1,2,3]
        Neighbor 0: on stack → low[3] = min(3, disc[0]) = 0
        Neighbor 4:
          Visit 4: disc=4, low=4   stack=[0,1,2,3,4]
            Visit 5: disc=5, low=5   stack=[0,1,2,3,4,5]
              Neighbor 4: on stack → low[5] = min(5, disc[4]) = 4
              disc[5]==low[5]? 5 ≠ 4. Not a root. Back to 4.
            low[4] = min(4, low[5]) = 4
            disc[4]==low[4]? 4==4 → SCC root
            Pop until 4: SCC = {5, 4}. onStack[4]=onStack[5]=false.
          low[3] = min(0, low[4]) = 0
        disc[3]==low[3]? 3 ≠ 0. Not a root.
      low[2] = min(2, low[3]) = 0
    low[1] = min(1, low[2]) = 0
  low[0] = min(0, low[1]) = 0
  disc[0]==low[0]? 0==0 → SCC root
  Pop until 0: SCC = {3, 2, 1, 0}

Output: [{5,4}, {3,2,1,0}]   ← reverse topological order

Vertex 4 is an SCC root because disc[4] == low[4] == 4. The back edge 5→4 updates low[5] to 4, but when the DFS returns to 4, low[4] is still 4. So vertex 4 is where the SCC boundary sits.

The final output is in reverse topological order: the {5,4} leaf SCC comes out before {0,1,2,3}. If you need forward topological order of the condensation DAG, reverse the output list.

DFS tree showing the discovery and low-link values for each vertex. Vertices 0 and 4 are highlighted with amber borders as SCC roots. Back edges from vertex 3 to vertex 0 and from vertex 5 to vertex 4 are shown as dashed amber arrows. Pop annotations show which vertices are collected into each SCC. The DFS tree after the algorithm runs. Amber borders mark SCC roots. Each back edge pulls a low value toward an ancestor. When the DFS unwinds back to a vertex where disc equals low, that vertex pops its SCC from the stack.

Time and Space Complexity

OperationTimeSpaceNotes
Build adjacency listO(V + E)O(V + E)One-time setup
Full SCC decompositionO(V + E)O(V)Single DFS pass
SCC root checkO(1) amortizedO(1)One comparison per vertex
Stack pop (per SCC)O(k)O(1)k = SCC size; total across all SCCs = O(V)

Why O(V + E) Time

The key observation is that every vertex is pushed onto the stack exactly once and popped exactly once, and every edge is examined exactly once.

The DFS visits each vertex once: O(V) recursive calls. Each call iterates over that vertex's outgoing edges: summed across all vertices, that's O(E). The onStack boolean array gives O(1) membership checks, so the back edge vs. cross edge distinction costs nothing extra. Stack pushes and pops total O(V) across the entire algorithm, not per SCC.

This is the same argument as DFS being O(V + E): the only difference is the constant work of maintaining disc, low, and onStack.

Why O(V) Space

Four arrays of length V plus the DFS call stack: total O(V) space, not counting the adjacency list you started with.

The algorithm needs disc, low, onStack, and the DFS call stack (which in the worst case, a linear chain of vertices, holds all V frames simultaneously). The SCC stack itself holds at most V vertices.

Why disc[v] == low[v] Identifies the SCC Root

This is the crux.

Say the DFS has finished processing all of v's descendants. If any vertex in v's subtree had a back edge reaching an ancestor of v (a vertex with a smaller discovery time that is still on the stack), then low[v] would be less than disc[v]. That means v is not the "top" of its SCC: there's an ancestor that belongs to the same SCC, and it will be discovered as the root when the DFS unwinds back to that ancestor.

If low[v] == disc[v], no vertex reachable from v's subtree can reach anything discovered before v that is still unfinished. That means v's SCC is completely contained within v's DFS subtree, and v is the highest vertex in that subtree. Popping everything from the stack down to v gives exactly the SCC.

You Will Try to Remove onStack. Don't.

At some point you will look at the four data structures and think you can simplify. You already have disc[w] != -1 to test whether a vertex was visited. The onStack boolean array seems redundant. You remove it. You test on the example graph. It works. You try three more cases. They all pass. Then you submit and get WA on a case with cross edges.

Consider an edge from vertex A to vertex B, where B has already been fully processed and assigned to its own SCC. Without the onStack check, you'd update low[A] = min(low[A], disc[B]). Since disc[B] might be very small (B was discovered early), this could pull down low[A] below disc[A], making A think it's connected to earlier ancestry. A would never become an SCC root, and neither would any vertex above it in the current DFS path.

The onStack check says: only edges to currently-active vertices contribute to low-link updates. Edges to finished SCCs are irrelevant, because those SCCs are already sealed off.

Side-by-side diagram. Left panel shows a back edge from vertex v to vertex w, both on the stack, with the formula low[v] = min(low[v], disc[w]) and a green update indicator. Right panel shows a cross edge from vertex v to a finished vertex w not on the stack, with a red do-not-update indicator. Back edge (left): w is on the stack, so the low-link update is correct and propagates the earlier discovery time upward. Cross edge (right): w is finished, already in its own SCC. Updating here would corrupt low[v] and silently merge separate SCCs.

You Get Reverse Topological Order for Free

This is a free property that most explanations underemphasize. Because the DFS discovers "leaf" SCCs (those with no successors in the condensation) first, they get output first. Collect all the SCC lists in order and you have a valid reverse topological ordering of the condensation DAG. Reverse that list and you have a valid topological ordering. No second pass needed. No extra work. You already did it during the SCC decomposition.

One Structure, Every Language

All fourteen implementations below use the same four-structure approach: disc, low, onStack, and an explicit DFS stack (or recursion). Each implementation is complete and runnable. Note that Python's default recursion limit of 1000 frames means graphs with long chains may need sys.setrecursionlimit(n + 10) or an iterative port.

from collections import defaultdict import sys sys.setrecursionlimit(10_000) def tarjan_scc(n: int, edges: list[tuple[int, int]]) -> list[list[int]]: graph: dict[int, list[int]] = defaultdict(list) for u, v in edges: graph[u].append(v) disc = [-1] * n low = [0] * n on_stack = [False] * n stack: list[int] = [] timer = [0] sccs: list[list[int]] = [] def dfs(v: int) -> None: disc[v] = low[v] = timer[0] timer[0] += 1 stack.append(v) on_stack[v] = True for w in graph[v]: if disc[w] == -1: dfs(w) low[v] = min(low[v], low[w]) elif on_stack[w]: low[v] = min(low[v], disc[w]) if disc[v] == low[v]: scc: list[int] = [] while True: node = stack.pop() on_stack[node] = False scc.append(node) if node == v: break sccs.append(scc) for v in range(n): if disc[v] == -1: dfs(v) return sccs

What Tarjan's Algorithm Actually Solves

Any Multi-Vertex SCC Contains a Cycle

Any SCC with more than one vertex contains at least one directed cycle. Any single-vertex SCC with a self-loop also forms a cycle. If you want to know whether a directed graph is acyclic, check whether every SCC has exactly one vertex. If any SCC has two or more, there's a cycle.

After decomposition, the condensation DAG makes the structure explicit. Each SCC becomes one node and every inter-SCC edge becomes a DAG edge. The graph goes from "tangled directed graph with possible cycles" to "clean dependency order you can reason about."

Condensation DAG showing SCC 1 containing vertices 0, 1, 2, and 3 as a blue pill node on the left, connected by a single directed arrow to SCC 2 containing vertices 4 and 5 as an amber pill node on the right. Caption notes that output order is SCC 2 first, SCC 1 second, which is reverse topological order. The condensed graph. Two SCCs, one DAG edge between them. Tarjan outputs SCC 2 first because it is a leaf in the condensation, discovered and sealed before the DFS unwinds back to SCC 1's root.

2-SAT Reduces Directly to SCC

A 2-SAT instance is a boolean formula where every clause has exactly two literals: (a OR b), (NOT c OR d), and so on. The problem is solvable in linear time by building an implication graph and finding its SCCs.

For each clause (a OR b), add edges NOT_a → b and NOT_b → a (if a is false, b must be true, and vice versa). Run Tarjan's algorithm. If any variable x and its negation NOT_x appear in the same SCC, the formula is unsatisfiable (x would have to be both true and false). Otherwise, you can read off a valid assignment: x is true if x's SCC comes later in topological order than NOT_x's SCC. Because Tarjan outputs SCCs in reverse topological order, "later in topological order" means "earlier in the sccs list."

Condense First, Then Run DP

Many DP problems assume a DAG. If the input graph has cycles, collapse each SCC into a single node using Tarjan, then run DP on the condensation. The condensation is always a DAG. Finding the longest path in a graph with cycles, or counting reachable nodes with cycle avoidance, becomes tractable after condensation. The SCC decomposition step runs in O(V + E) and the resulting condensation has at most V nodes and E edges, so you pay nothing asymptotically.

Deadlock Is a Cycle in Disguise

In a resource-allocation graph, a cycle among processes and resources indicates deadlock. The SCC containing a cycle is the deadlocked cluster. Tarjan identifies every such cluster in one linear pass, which is exactly what operating system schedulers need.

Compilers Find Loops the Same Way

In control flow graphs, SCCs correspond to loops. Optimizers use this to find loops eligible for vectorization, hoisting, or strength reduction. LLVM exposes this through its scc_iterator class, which wraps Tarjan's algorithm directly.

When to Use Tarjan's Algorithm

These five signals suggest Tarjan's SCC is the right tool.

  1. "Can every node reach every other node?" You are checking if the entire graph is one SCC.
  2. "Find all nodes that are part of a cycle" in a directed graph. Any node in a multi-vertex SCC is on a cycle.
  3. "Simplify a cyclic graph into a DAG" for further processing. Condense into SCCs, then proceed.
  4. "Is this boolean formula satisfiable?" where clauses have two literals. Build the implication graph and run SCC.
  5. "Detect deadlock / circular dependency" in a system modeled as a directed graph.

Tarjan SCC is for directed graphs. For undirected graphs, you want connected components (BFS/DFS or Union-Find) or biconnected components (a different bridge-finding variant that also uses disc/low arrays but with different semantics).

Worked Example 1: Course Schedule with Circular Prerequisites

Problem: Given a list of courses and prerequisites, determine which courses are part of a circular dependency. Return the list of courses that can never be completed.

Why SCC fits: A circular dependency is exactly a directed cycle. Any course in a multi-vertex SCC is in a cycle and can never be completed (you'd need to complete course A to complete B, but B to complete A).

Approach:

  1. Build a directed graph where edge u → v means "u is a prerequisite for v."
  2. Run Tarjan's SCC.
  3. Any SCC with size > 1 contains stuck courses. Return all their members.

If a single-vertex SCC has a self-loop (a course that is its own prerequisite), it's also stuck. Check has_self_loop[v] separately if needed, or look for self-edges in the input.

Complexity: O(V + E) where V = courses, E = prerequisites.

Worked Example 2: 2-SAT (LeetCode 1 variant)

Problem: You have n boolean variables. You are given a list of clauses, each saying "(var_i is true OR var_j is false)" or similar two-literal constraints. Is there an assignment satisfying all clauses?

Why SCC fits: 2-SAT has a classic linear reduction to SCC. Each clause (a OR b) encodes two implications (NOT_a → b, NOT_b → a).

Approach:

  1. Create 2n nodes: one for each variable and one for its negation. Conventionally, variable i maps to node i, and NOT_i maps to node i + n.
  2. For each clause (a OR b): add edge NOT_a → b and NOT_b → a.
  3. Run Tarjan's SCC.
  4. For each variable i: if i and i + n are in the same SCC, output "unsatisfiable."
  5. Otherwise, output "satisfiable." Variable i is assigned true if its SCC index is greater than that of NOT_i in the condensation order (using Tarjan's reverse-topological output, this means: check which SCC was output earlier in the sccs list).

Why the SCC index works: If i and NOT_i are in different SCCs, one SCC can reach the other but not vice versa. Assigning truth value based on which SCC comes later in topological order is guaranteed to satisfy all implications, and therefore all clauses.

Complexity: O(V + E) where V = 2 * (number of variables) and E = 2 * (number of clauses).

The Bugs You Will Write

Removing onStack to simplify the code: You looked at the four structures and thought the boolean array was redundant. You had disc[w] != -1 for visited checks already. Your simplified version passed every test you ran. Then it failed on a graph with cross edges pointing back to an already-finished SCC. Without onStack, cross edges look identical to back edges and corrupt low-link values silently. The SCCs come out wrong and there is no error message telling you why.

Using low[w] instead of disc[w] for back edges: Most textbook presentations use disc[w] when updating from a back edge (neighbor is on the stack but already visited). Using low[w] also gives correct results in the onStack version because low[w] can only point to vertices that are also on the stack at that moment. But disc[w] is the cleaner semantic and matches Tarjan's original 1972 paper. Pick one and stick with it.

Python recursion limit: The default limit is 1000 frames. A linear chain of 1001 vertices will crash. Not a stack overflow from the OS. A Python-level RecursionError at exactly 1000 calls, before the OS gets a chance to complain. Add sys.setrecursionlimit(n + 10) at the top, or convert to an explicit iterative stack for any graph that might be large.

Confusing Tarjan SCC with Tarjan's bridge-finding: Both use disc and low arrays, both are O(V + E). But bridge-finding runs on undirected graphs, uses parent tracking to avoid trivially returning on tree edges, and uses strict < instead of <= in the low-link update. The two algorithms share a skeleton but solve different problems. Using SCC code for bridge-finding gives wrong results in the other direction too.

The Short Version

  • Tarjan's algorithm finds all SCCs in a directed graph in one DFS pass: O(V + E) time, O(V) space.
  • It maintains three arrays (disc, low, onStack) and one stack.
  • low[v] is the minimum discovery time reachable from v's subtree via tree edges plus back edges to the current stack.
  • When disc[v] == low[v], v is an SCC root. Pop the stack to v.
  • The onStack check is essential: without it, cross edges to finished SCCs corrupt low-link values.
  • SCCs come out in reverse topological order of the condensation DAG.
  • Applications: cycle detection, 2-SAT, condensation for DP, deadlock detection, compiler loop analysis.
  • Beats Kosaraju's algorithm by avoiding graph reversal and a second DFS pass.

If you want to practice this under interview conditions, SpaceComplexity runs voice-based DSA mock interviews with real-time rubric feedback across communication, problem-solving, and code quality. It's a faster way to build the muscle than silent LeetCode grinding.

For more on the graph algorithms that pair with Tarjan's, see the topological sort deep dive and the DFS pattern recognition guide. If you end up using SCC for 2-SAT or condensation DP, the when to use dynamic programming article covers the DP-on-DAG patterns that follow.

Further Reading