Topological Sort vs DFS: Same Graph, Completely Different Goals

At some point in your graph theory education, someone told you topological sort and DFS were related. They were right. But in the same way a hammer and a house are related: one is a tool, the other is the job.

DFS is a traversal strategy; topological sort is an ordering problem. You can implement topological sort using DFS. You can also implement it without DFS at all. The confusion comes from seeing the DFS-based implementation first and concluding they're basically the same thing. They are not.

DFS Has No Opinions

Depth-first search answers one question: have I visited every reachable node? It picks a node, goes as deep as possible along one path, backtracks, and tries the next branch.

def dfs(graph: dict, node: int, visited: set) -> None:
    visited.add(node)
    for neighbor in graph[node]:
        if neighbor not in visited:
            dfs(graph, neighbor, visited)

DFS works on any graph: directed, undirected, cyclic, acyclic. It has zero opinion about which node should come before which other node. Its job is reachability, not ranking. Asking DFS to produce a dependency ordering is like asking a dog to drive. Technically you can strap it to the wheel, but it's not going to go well.

Use DFS when you need to explore all nodes, find connected components, or traverse a tree. The output is a traversal order. That is not the same as a valid dependency ordering. For a deeper breakdown, see Depth-First Search: The Algorithm Behind Half Your Graph Problems.

Topological Sort Cares Only About Order

Topological sort answers a completely different question: given a directed graph where edges mean "A must come before B," what is a valid linear ordering of all nodes?

Classic example: courses with prerequisites. Course 0 before course 1, course 1 before course 2. A valid topological order is [0, 1, 2]. There may be other valid orderings. But every valid ordering must respect every "before" constraint.

A → B → D
    ↓
    C → E

Valid orderings: [A, B, C, D, E] or [A, B, D, C, E]
Invalid:         [B, A, C, D, E]  (A must precede B)

Topological sort only works on Directed Acyclic Graphs (DAGs). If the graph has a cycle, there is no valid ordering. You cannot take course A before B and B before A simultaneously. When you ask for a topological sort of a cyclic graph, the correct answer is "impossible," not "here's my best guess."

Implementation 1: DFS-Based Topological Sort

Here is where the confusion lives. The most common recursive topological sort implementation uses DFS internally, which makes people think they're the same thing.

The idea: run DFS, but instead of processing a node when you first visit it, add it to a result list when you finish visiting it (post-order). Reverse the list when done. That is a valid topological ordering.

def topological_sort_dfs(graph: dict, num_nodes: int) -> list[int]:
    WHITE, GRAY, BLACK = 0, 1, 2
    color = [WHITE] * num_nodes
    result = []
    has_cycle = False

    def dfs(node: int) -> None:
        nonlocal has_cycle
        color[node] = GRAY
        for neighbor in graph[node]:
            if color[neighbor] == GRAY:
                has_cycle = True
                return
            if color[neighbor] == WHITE:
                dfs(neighbor)
        color[node] = BLACK
        result.append(node)   # post-order: add after all descendants finish

    for node in range(num_nodes):
        if color[node] == WHITE:
            dfs(node)

    return [] if has_cycle else result[::-1]

Why does post-order work? A node gets appended to the result only after every node it points to has already been appended. Reversing the list puts "no prerequisites" nodes first. The post-order trick is what converts DFS from a traversal into an ordering. Without the reversal, you have a useless backward list.

Cycle detection uses three states, which look a lot like a traffic light if you squint:

White (0): not yet visited
Gray (1): currently in the DFS call stack (hitting gray means you found a back edge, which means cycle)
Black (2): fully processed, no cycles downstream

Three-color DFS: valid DAG vs. cyclic graph showing how GRAY nodes detect back edges

The GRAY state is what catches cycles. If you reach a node that is still on the call stack, you have gone in a circle.

Time: O(V + E). Space: O(V) for the color array and recursion stack.

Implementation 2: Kahn's Algorithm (No DFS Required)

Kahn's algorithm does topological sort without any DFS. It is BFS-based, and you can explain it out loud without losing your interviewer.

The core idea: nodes with no incoming edges have no prerequisites. Process them first. Remove their outgoing edges. Any node that drops to in-degree 0 gets processed next. Repeat until the queue is empty.

from collections import deque

def topological_sort_kahn(graph: dict, num_nodes: int) -> list[int]:
    in_degree = [0] * num_nodes
    for node in range(num_nodes):
        for neighbor in graph[node]:
            in_degree[neighbor] += 1

    queue = deque(node for node in range(num_nodes) if in_degree[node] == 0)
    result = []

    while queue:
        node = queue.popleft()
        result.append(node)
        for neighbor in graph[node]:
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:
                queue.append(neighbor)

    return result if len(result) == num_nodes else []

Cycle detection is free. If the result list is shorter than the total node count, some nodes never reached in-degree 0 because a cycle trapped them. No extra bookkeeping required: short result means cycle, full result means valid order.

This is genuinely beautiful. No color arrays, no call stack concerns, no recursion depth limits. Just a queue and a counter per node.

Time: O(V + E). Space: O(V). Identical to the DFS variant.

Topological Sort vs DFS: Full Comparison

Property	DFS (plain)	Topo Sort (DFS)	Topo Sort (Kahn's)
Works on	Any graph	DAG only	DAG only
Goal	Visit all nodes	Order all nodes	Order all nodes
Cycle detection	No built-in	Gray-node check	`len(result) < V`
Uses DFS	Yes	Yes (post-order)	No (BFS)
Output	Traversal sequence	Valid linear order	Valid linear order
Time complexity	O(V + E)	O(V + E)	O(V + E)
Space complexity	O(V)	O(V)	O(V)
Stack overflow risk	On deep graphs	On deep graphs	None

Both topological sort implementations have identical time and space complexity. The choice comes down to code clarity, cycle detection style, and whether you want to avoid recursion entirely. There is no wrong answer, but there is a "harder to explain at 10am in an interview" answer.

When to Use Each

Reach for plain DFS when:

You need to explore all reachable nodes without caring about order
You are working with undirected or cyclic graphs
The problem asks "does a path exist" or "how many nodes are reachable," not "what order"

Reach for DFS-based topological sort when:

You need a valid ordering and you already think recursively
The three-color cycle detection reads cleanly to you
It is a classic prerequisite or dependency chain problem on LeetCode

Reach for Kahn's algorithm when:

You want to avoid recursion and eliminate stack overflow risk on large inputs
You need to explain the algorithm in one sentence under pressure
The problem involves processing nodes in "levels." Kahn's works naturally by BFS layer

Worked Example: Course Schedule II (LeetCode 210)

Given numCourses and a list of [course, prerequisite] pairs, return a valid order to take all courses, or an empty array if impossible.

DFS solution:

def findOrder(numCourses: int, prerequisites: list[list[int]]) -> list[int]:
    graph = [[] for _ in range(numCourses)]
    for course, prereq in prerequisites:
        graph[prereq].append(course)

    WHITE, GRAY, BLACK = 0, 1, 2
    color = [WHITE] * numCourses
    result = []

    def dfs(node: int) -> bool:
        if color[node] == GRAY:
            return False
        if color[node] == BLACK:
            return True

        color[node] = GRAY
        for neighbor in graph[node]:
            if not dfs(neighbor):
                return False
        color[node] = BLACK
        result.append(node)
        return True

    for i in range(numCourses):
        if not dfs(i):
            return []

    return result[::-1]

Kahn's solution:

from collections import deque

def findOrder(numCourses: int, prerequisites: list[list[int]]) -> list[int]:
    graph = [[] for _ in range(numCourses)]
    in_degree = [0] * numCourses

    for course, prereq in prerequisites:
        graph[prereq].append(course)
        in_degree[course] += 1

    queue = deque(i for i in range(numCourses) if in_degree[i] == 0)
    result = []

    while queue:
        node = queue.popleft()
        result.append(node)
        for neighbor in graph[node]:
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:
                queue.append(neighbor)

    return result if len(result) == numCourses else []

Both produce correct results. For a deeper dive on graph theory foundations, see Dependencies Have an Order. Topological Sort Finds It..

In an interview, Kahn's is easier to explain out loud: "Start with courses that have no prerequisites, take them, remove their outgoing edges, repeat. If I process all courses, return the order; if the queue empties early, there's a cycle." One breath. Clean. Your interviewer nods.

The Interview Trap (It Gets Everyone)

There is a gravitational pull toward the DFS variant in interviews. It feels more algorithmic. You saw it in your algorithms course. You implemented it once at 2am before a final. You reach for it by reflex.

You do not need to. Kahn's is a perfectly valid answer and usually the cleaner one under time pressure. If you can explain it in plain English while you code it, that is actually better than producing a three-color DFS that you have to pause and re-derive from memory.

The other trap: using plain DFS when the problem requires ordering. Plain DFS gives you a traversal, not an ordering. The tell is the problem statement: does it ask which nodes you visit, or the sequence in which you process them? Ordering means topological sort. Reachability means DFS. They are not interchangeable, even though one implements the other.

Both algorithms appear constantly in graph interview problems. For a comparison of where BFS and DFS split on shortest-path work, see BFS vs DFS: One Question Decides It Every Time. Check out SpaceComplexity to practice explaining the tradeoffs out loud. Coding both variants is the easy half. Talking through when and why under interview pressure is what actually separates "strong hire" from "mixed signals."