Dependencies Have an Order. Topological Sort Finds It.

Make, npm, Gradle, apt, pip, Cargo. Different tools, different ecosystems, different ways to ruin your Friday deploy. One algorithm underneath all of them: topological sort.

The problem is always the same: you have a set of tasks where some cannot start until others have finished. Compile this file before that one. Install this package before the one that depends on it. Take Algorithms before Advanced Algorithms. The structure is a directed graph where an arrow from A to B means "A must come before B." Your job is to produce a valid ordering.

There's a catch. If A depends on B and B depends on A, no valid ordering exists. The circular dependency breaks everything. So your algorithm has a second job: detect that cycle and report it before anyone tries to execute anything.

Topological sort does both: it finds a valid linear ordering of a Directed Acyclic Graph (DAG), and it tells you when the graph contains a cycle that makes ordering impossible. Time complexity is O(V + E) for both operations, where V is the number of vertices and E is the number of edges.

Two Algorithms, One Problem

There are two standard approaches. Both run in O(V + E). They think about the problem from opposite ends of the dependency chain.

Kahn's Algorithm: Work From What's Ready

Arthur Kahn published this in Communications of the ACM in 1962, as a byproduct of work at Westinghouse. That's right: the algorithm powering your npm install predates the first commercial modem. The intuition is grounded: a task is ready to execute when all its prerequisites have completed. Start with the tasks that have no prerequisites, run them, cross them off, and see what becomes available next.

In graph terms, a node's in-degree is the count of edges pointing into it. A node with in-degree 0 has no unmet dependencies. It can go first.

The algorithm:

Compute in-degree for every vertex by walking all edges.
Push every zero-in-degree vertex into a queue.
While the queue is not empty: dequeue a vertex u, append it to the result, then for each neighbor v of u, decrement in-degree[v]. If it hits 0, enqueue v.
If the result contains fewer than V vertices, a cycle exists.

The cycle detection is implicit. Every vertex in a cycle has at least one incoming edge from another cycle member. That means no cycle vertex ever reaches in-degree 0. No cycle vertex ever enters the queue. When the queue empties early, the unprocessed vertices are exactly those stuck in cycles.

Here is a walkthrough on a small graph. Edges: A→B, A→D, B→C, D→C, D→E.

Initial in-degrees:
  A=0  B=1  C=2  D=1  E=1

Queue: [A]
Process A → decrement B (now 1→0), D (now 1→0)
Queue: [B, D]
Process B → decrement C (now 2→1)
Queue: [D, C? no, C still 1]
Process D → decrement C (now 1→0), E (now 1→0)
Queue: [C, E]
Process C → no outgoing edges
Process E → no outgoing edges

Result: [A, B, D, C, E]  ✓  (|result| == 5 == V)

Kahn's algorithm step-by-step: directed graph A→B, A→D, B→C, D→C, D→E with in-degree badges and a queue processing table

Node A starts in the queue (in-degree 0). Each step reveals the next ready nodes.

DFS-Based: Work From the End

The DFS approach goes the other direction. Run a depth-first search on the graph. When you finish exploring a node (after all nodes reachable from it have been fully processed), push it onto a stack. Once everything is done, pop the stack. That reverse post-order is your topological sort.

Why does this work? When node u finishes in DFS, every node reachable from it has already finished. If there's an edge u→v, v was explored from u and finished first, so v is appended to the result list before u. Reversing the list puts u before v. That's exactly right: u→v means u should come first.

For cycle detection in the DFS approach, you need three states, not two. A single boolean visited array will produce false positives on directed graphs.

Consider:

A → C
B → C

If DFS from A visits and marks C as done, then DFS from B reaches C (already visited). A naive check says "visited node encountered, must be a cycle." That's wrong. C is just a shared dependency with two parents. No cycle. Your detector just refused to build a perfectly valid graph.

The fix is three-color marking, borrowed from the CLRS description of DFS:

White: not yet visited
Gray: currently on the active recursion stack
Black: fully processed, all descendants explored

Only encountering a gray node signals a cycle. Gray means the node is an ancestor in the current call stack, which means you've found a back edge. Encountering a black node just means you've reached a previously finished node from a different path. That's fine. Two people both depending on the same library isn't a circular dependency, no matter what your gut says.

def dfs_topological_sort(graph, num_vertices):
    WHITE, GRAY, BLACK = 0, 1, 2
    color = [WHITE] * num_vertices
    result = []
    has_cycle = False

    def dfs(vertex):
        nonlocal has_cycle
        if has_cycle:
            return
        color[vertex] = GRAY
        for neighbor in graph[vertex]:
            if color[neighbor] == GRAY:
                has_cycle = True
                return
            if color[neighbor] == WHITE:
                dfs(neighbor)
        color[vertex] = BLACK
        result.append(vertex)  # post-order: push after all descendants

    for v in range(num_vertices):
        if color[v] == WHITE:
            dfs(v)

    if has_cycle:
        return None
    return result[::-1]  # reverse the post-order

Three-color DFS: left shows a shared dependency (A->C, B->C) with all nodes BLACK indicating no cycle; right shows a cycle (A->B->C->A) with all nodes GRAY and the back edge highlighted in red

Left: two parents sharing one node is fine. Right: encountering a GRAY ancestor means you've looped back. Cycle confirmed.

What's Actually Happening in Memory

Both algorithms use an adjacency list: an array of size V where position i holds all vertices that i points to. This is the critical choice that makes the complexity O(V + E) rather than O(V²).

With an adjacency matrix, iterating over a vertex's neighbors costs O(V) even if the vertex has only one edge. With an adjacency list, that same iteration costs O(degree(v)). Summed over all vertices, that's O(E).

Graph with 5 vertices (0-indexed):
Edges: 0→1, 0→3, 1→2, 3→2, 3→4

Adjacency list:
  0: [1, 3]
  1: [2]
  2: []
  3: [2, 4]
  4: []

In-degrees (Kahn's):
  0=0  1=1  2=2  3=1  4=1
         ^         ^
         |         |
  B gets one    D gets one
  from A        from A

Space breakdown: the adjacency list holds V list pointers and E total neighbor entries, giving O(V + E) total. The in-degree array and queue each hold O(V) elements. The result is O(V).

Adjacency list for a 5-vertex DAG with in-degree array alongside: vertex 0 (in-degree 0, highlighted blue) points to neighbors 1 and 3; vertex 2 (in-degree 2, highlighted amber) is a sink

The adjacency list visits each edge exactly once during traversal. That's what makes O(V + E) possible.

Why O(V + E)? The Accounting

Think of it like a bar tab: assign each unit of work to a vertex or an edge, and show each is paid for exactly once. Nobody gets charged twice.

Kahn's algorithm:

Building in-degrees walks every edge once. That charges O(E). Initializing the queue checks every vertex once: O(V). The main loop dequeues each vertex exactly once (it enters the queue exactly when its in-degree first hits 0, and it never re-enters). Total dequeue operations: O(V). When processing a dequeued vertex, you walk its outgoing edges. Each edge u→v is walked exactly once, when u is being processed. Total edge work across all dequeue operations: O(E). Grand total: O(V + E).

DFS algorithm:

Each vertex is visited exactly once (the color check prevents revisiting). O(V). When visiting a vertex, you iterate its adjacency list. Each edge u→v is traversed exactly once, when u is being visited. O(E). Stack operations and reversal: O(V). Grand total: O(V + E).

The key guarantee: every vertex is processed exactly once, and every edge is examined exactly once. Nothing is charged twice.

Operations at a Glance

Operation	Time	Space	Notes
Build adjacency list	O(V + E)	O(V + E)	One pass over edge list
Topological sort (Kahn's)	O(V + E)	O(V)	BFS, iterative, no recursion depth risk
Topological sort (DFS)	O(V + E)	O(V)	O(V) for call stack in worst case
Cycle detection	O(V + E)	O(V)	Free in both; Kahn's count check vs. DFS gray detection
Lexicographically smallest ordering	O(V log V + E)	O(V)	Replace queue with min-heap in Kahn's
Check if ordering is unique	O(V + E)	O(V)	Unique iff Kahn's queue never has more than one element

The ordering is unique if and only if at every step in Kahn's, the queue holds exactly one element. That condition means every vertex has exactly one possible successor in the ordering, which means the graph is a directed Hamiltonian path: a single chain with no branching. Most real dependency graphs have multiple valid orderings, and any one of them is correct.

One Structure, Every Language

Kahn's algorithm (BFS-based) with cycle detection, in all major languages. Each function returns the topological order on success, or null/None/false (depending on the language) when a cycle is detected.

from collections import deque

def topological_sort(num_vertices: int, edges: list[tuple[int, int]]) -> list[int] | None:
    graph: list[list[int]] = [[] for _ in range(num_vertices)]
    in_degree = [0] * num_vertices

    for source, dest in edges:
        graph[source].append(dest)
        in_degree[dest] += 1

    queue: deque[int] = deque(v for v in range(num_vertices) if in_degree[v] == 0)
    result: list[int] = []

    while queue:
        vertex = queue.popleft()
        result.append(vertex)
        for neighbor in graph[vertex]:
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:
                queue.append(neighbor)

    return result if len(result) == num_vertices else None


# Example
edges = [(0, 1), (0, 3), (1, 2), (3, 2), (3, 4)]
print(topological_sort(5, edges))  # [0, 3, 1, 4, 2] or similar valid ordering
print(topological_sort(3, [(0, 1), (1, 2), (2, 0)]))  # None (cycle)

What Problems It Solves

Topological sort is the right tool whenever you have a partial order: a set of items with "must come before" relationships, and you need to linearize them.

Dependency installation. npm, pip, apt, Cargo. Every package manager builds a DAG from package metadata and resolves installation order via topological sort. Circular dependencies cause a hard failure (because there's no valid ordering).

Build systems. Make, Bazel, Gradle, CMake. Each build target declares its dependencies. The build system topologically sorts targets to determine compilation order. This is also why incremental builds work: only the targets downstream of a changed file need rebuilding.

Task scheduling. Apache Airflow, Kubernetes operators, CI/CD pipelines. Workflow systems express task dependencies as a DAG and execute tasks in topological order. Parallelism is natural: tasks with no ordering relationship between them can run concurrently.

DP on DAGs. Topological sort unlocks linear-time dynamic programming on directed acyclic graphs. If you process vertices in topological order, you're guaranteed that when you compute state for vertex v, all states for vertices with edges into v have already been computed. Longest path in a DAG, critical path analysis, and DAG-based shortest paths all exploit this.

Spreadsheet evaluation. Cells form a dependency graph (B2 depends on A1 and C3). Recomputing in topological order ensures every cell is computed after its inputs. A circular reference (A1 depends on B2 depends on A1) is detected as a cycle.

Compiler instruction scheduling. Modern CPUs execute instructions out of order, subject to data dependencies. The compiler builds a dependency DAG over instructions and topologically sorts it to find valid instruction schedules that respect data flow.

Recognizing the Topological Sort Pattern

The clearest signal is the phrase "must come before," in any wording. Prerequisites, dependencies, ordering constraints, sequencing requirements. When you see any of those, draw a directed graph in your head, and ask whether topological sort applies.

Concrete signals:

The problem gives you pairs (A, B) meaning "A before B" or "A required for B"
You need to detect whether a valid ordering exists
You need to find all items in dependency order
The word "cycle" appears as something to detect or avoid
The problem has a DAG in disguise (course prerequisites, task scheduling, file compilation)

Worked Example 1: Course Schedule (LeetCode 207 and 210)

Problem statement: There are n courses numbered 0 through n-1. You're given a list of prerequisite pairs where [a, b] means you must take course b before course a. Can you finish all courses? (207). If yes, return one valid order (210).

Why topological sort: The problem is a dependency graph in thin disguise. Build a directed graph with an edge from b to a for each prerequisite pair [a, b]. That models "b must come before a." Run Kahn's. If the result has n vertices, all courses are finishable, and that ordering solves 210. Fewer than n means a cycle exists, so 207 returns false.

def can_finish(num_courses: int, prerequisites: list[list[int]]) -> bool:
    # prerequisite [a, b] means b -> a (take b before a)
    graph = [[] for _ in range(num_courses)]
    in_degree = [0] * num_courses

    for course, prereq in prerequisites:
        graph[prereq].append(course)
        in_degree[course] += 1

    from collections import deque
    queue = deque(c for c in range(num_courses) if in_degree[c] == 0)
    processed = 0

    while queue:
        course = queue.popleft()
        processed += 1
        for next_course in graph[course]:
            in_degree[next_course] -= 1
            if in_degree[next_course] == 0:
                queue.append(next_course)

    return processed == num_courses  # False means cycle exists

The trap: getting the edge direction backwards. The pair [a, b] means "take b before a," so the edge is b→a. If you build it as a→b you'll sort in the wrong order and your cycle detection will flag non-cyclic graphs. It's a quiet bug. Everything runs, nothing breaks immediately, and then your course schedule is backwards.

Worked Example 2: Alien Dictionary (LeetCode 269)

Problem statement: You're given a list of words sorted lexicographically in an alien language. Determine the order of the alphabet in that language.

Why topological sort: Each pair of adjacent words in the list gives you one ordering constraint. Compare "wrt" and "wrf": they share "wr" but differ at position 2, so t comes before f in the alien alphabet. That's a directed edge t→f. Do this for every adjacent word pair to build the graph, then topologically sort the character nodes.

from collections import deque, defaultdict

def alien_order(words: list[str]) -> str:
    graph = defaultdict(set)
    in_degree = {char: 0 for word in words for char in word}

    for i in range(len(words) - 1):
        word1, word2 = words[i], words[i + 1]
        min_length = min(len(word1), len(word2))
        # Detect invalid input: longer word before its prefix
        if len(word1) > len(word2) and word1[:min_length] == word2[:min_length]:
            return ""
        for j in range(min_length):
            if word1[j] != word2[j]:
                if word2[j] not in graph[word1[j]]:
                    graph[word1[j]].add(word2[j])
                    in_degree[word2[j]] += 1
                break  # only the first difference gives a constraint

    queue = deque(c for c in in_degree if in_degree[c] == 0)
    result = []

    while queue:
        char = queue.popleft()
        result.append(char)
        for neighbor in graph[char]:
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:
                queue.append(neighbor)

    return "".join(result) if len(result) == len(in_degree) else ""

The non-obvious part: you only get one constraint per adjacent word pair (the first character where they differ). Once you've found the differing position, break. Also, "abc" before "ab" is an invalid input: a word cannot come before its own prefix. Catch this before building the graph.

Kahn's vs DFS: Which One to Use

Both are O(V + E). The choice comes down to what the problem requires.

Use Kahn's (BFS) when:

You want an iterative implementation (no recursion depth risk for large graphs)
You need to detect cycles cleanly with just a count check
You want the lexicographically smallest ordering (swap in a min-heap)
You're building a pipeline that processes tasks as they become ready (Kahn's naturally emits nodes in "ready to run" order)

Use DFS when:

You're already running DFS for another reason (graph exploration, SCC detection)
You want to know exactly which node is part of a cycle (the gray node you hit is on the cycle)
The recursive structure maps more cleanly to the problem

In interviews, Kahn's is usually preferred because it's iterative, handles disconnected graphs automatically (you seed the queue with all zero-in-degree nodes, not just one starting vertex), and the cycle detection is explicit. There's no forgetting to reverse a result, and there's no hidden assumption about graph connectivity. It's also easier to explain to someone watching you over a video call when your brain is running at 60%.

The Non-Obvious Bits

A few things that trip people up in practice:

Isolated vertices still belong in the output. A vertex with no incoming and no outgoing edges has in-degree 0 and will enter the queue immediately. It gets placed in the result. Don't filter it out; it's a valid part of the ordering.

Disconnected graphs work without extra handling in Kahn's. You initialize the queue with every zero-in-degree vertex. There's no "run from each unvisited source" logic needed. In DFS you do need to loop over all vertices to catch disconnected components, which is an easy thing to miss.

The visited vs in-stack mistake. As shown earlier, using a single boolean visited array for cycle detection in DFS on a directed graph produces false positives. Shared nodes (reachable via multiple paths) get misidentified as cycles. Three colors fix this. Kahn's count check sidesteps the problem entirely.

Wrong edge direction is the most common bug. If the problem says [a, b] means "a depends on b," the edge should be b→a (b must come before a). If you build it as a→b you get a valid-looking topological sort of the reversed graph. It compiles. It runs. The output looks plausible. It's completely wrong. This is the bug that makes you question whether you actually understand directed graphs, at hour three, during a take-home.

You can use a min-heap in Kahn's for lexicographic order. The standard queue gives you one valid ordering, not necessarily the smallest. Replace deque with heapq in Python (or PriorityQueue in Java), and you get the lexicographically smallest topological sort. The complexity becomes O(V log V + E) because each heap operation costs O(log V).

Recap

Topological sort produces a linear ordering of vertices in a DAG such that for every edge u→v, u appears before v.
Only valid on DAGs. Any cycle makes ordering impossible.
Two algorithms: Kahn's (BFS, iterative, in-degree tracking) and DFS (post-order reversal, three-color cycle detection).
Both run in O(V + E) time and O(V) space.
Kahn's detects cycles via count check: if |result| < V, a cycle exists.
DFS detects cycles via gray nodes: encountering a gray node during DFS means a back edge, which means a cycle.
A boolean visited is not sufficient for directed graph cycle detection. Use three states.
The ordering is unique if and only if Kahn's queue never holds more than one element at a time.
For lexicographically smallest ordering, replace the queue in Kahn's with a min-heap (O(V log V + E)).
Edge direction is the most common source of bugs. Get it right when building the graph.

If you want to practice explaining Kahn's algorithm out loud under pressure, SpaceComplexity runs voice-based mock interviews where you narrate your approach in real time and get rubric-based feedback on whether the interviewer actually understood what you said. The gap between knowing the algorithm and explaining it clearly is bigger than most people expect.

For more on how topological sort pairs with dynamic programming (processing states in dependency order to eliminate memoization), see our post on the dynamic programming framework. If you find yourself pattern-matching graphs in general, the sliding window article covers the adjacent technique of turning nested loops into linear passes, which comes up often in the same problems.