BFS vs DFS for Shortest Path: Why One Works and One Doesn't

- BFS guarantees shortest path in unweighted graphs because it processes every node at distance k before any node at distance k+1.
- DFS has no shortest-path guarantee — it follows one branch all the way down before backtracking, with no mechanism for minimizing path length.
- The visited-set timing matters: add nodes to visited when enqueued, not dequeued, or the same node enters the queue multiple times from different neighbors.
- Weighted graphs require Dijkstra (min-heap, O((V+E) log V)); for 0/1-weighted edges, a deque-based 0-1 BFS runs in O(V+E).
- Grid maze problems are the canonical BFS shortest path interview pattern — uniform move cost, minimize steps to reach the target.
- Interview signal: "minimum steps," "minimum transformations," or "shortest hops" with equal-cost moves means BFS, not DFS.
You have a graph. You need the shortest path. Two algorithms come to mind: BFS and DFS. One of them solves your problem. The other finds a path, declares victory, and looks you right in the eye.
BFS guarantees the shortest path in unweighted graphs. DFS does not. This isn't a subtle edge case you can reason your way around. It's a structural difference that comes from how each algorithm decides what to visit next.
This guide covers why BFS works, why DFS fails, what to use when edge weights enter the picture, and the interview patterns that test this distinction.
BFS Explores by Distance
Breadth-first search processes nodes in order of their distance from the source. It visits every node one hop away, then every node two hops away, and so on. The layered expansion is what makes it useful for shortest paths.
On a small graph:
A
/ \
B C
/ \ \
D E F
BFS from A visits in order: A, B, C, D, E, F. When it first reaches F via C, that path A → C → F has exactly 2 edges. BFS would never find a shorter route later because every node at distance 2 is already processed before any node at distance 3 gets touched. It expands like a ripple: everything one meter out before anything two meters out.
The first time BFS reaches a node, it has found the shortest path to that node. There's no way a longer exploration later discovers something better, because BFS works through distances monotonically.
The implementation tracks a visited set so nodes aren't processed twice. Once a node is added to the queue, the path that put it there is optimal.
from collections import deque def bfs_shortest_path(graph, start, end): queue = deque([[start]]) visited = {start} while queue: path = queue.popleft() node = path[-1] if node == end: return path for neighbor in graph[node]: if neighbor not in visited: visited.add(neighbor) queue.append(path + [neighbor]) return None
Time complexity: O(V + E). Space: O(V) for the queue and visited set.

DFS Finds A Path. Not the Shortest One.
Depth-first search picks a direction and follows it as far as it can before backtracking. It's not trying to minimize anything. It has one goal: to explore. Whether the path it takes is optimal is genuinely not its concern. DFS has the energy of a colleague who found one solution and stopped reading the requirements.
Using the same graph, DFS from A might visit: A, B, D, E, C, F. When it reaches F through the B subtree first, it doesn't pause to wonder if a shorter route exists. It found a path. It's done. It's going home.
A ←── start
/ \
B C
/ \
D F ←── end
\
E ── (long chain)
DFS could easily descend into B → D → E before it ever considers C. If F is only reachable through C, DFS still gets there, but via a much longer walk through everything else first.
If you want the true shortest path with DFS, you'd have to explore every possible path and track the minimum. That turns into an exponential-time algorithm for general graphs, because you lose the guarantee that early-found paths are optimal. You've reinvented a terrible version of Dijkstra.
DFS is excellent for connectivity, cycle detection, topological sort, and tree traversal. It is the wrong tool for shortest path.
Same Big-O, Completely Different Guarantees
Both algorithms run in O(V + E) time and O(V) space. That sameness tricks people into thinking they're interchangeable for shortest path. They're not.
| Property | BFS | DFS |
|---|---|---|
| Time complexity | O(V + E) | O(V + E) |
| Space complexity | O(V) | O(h), where h is max depth |
| Finds shortest path (unweighted)? | Yes | No |
| Finds any path? | Yes | Yes |
| Explores in order of distance? | Yes | No |
| Uses queue or stack? | Queue (FIFO) | Stack (LIFO) |
The space difference matters in practice. BFS holds an entire frontier level in its queue, which can be O(V). DFS's call stack is only as deep as the current path, which is O(h). On a balanced tree that's O(log V). On a path graph it's O(V). Same worst case, very different typical behavior.
Grid Shortest Path: The Interview Bread and Butter
The most common interview application is a grid maze. You're given a 2D grid with open cells and walls. Find the minimum number of steps from start to end.
from collections import deque def min_steps(grid, start, end): rows, cols = len(grid), len(grid[0]) directions = [(0,1),(0,-1),(1,0),(-1,0)] queue = deque([(start[0], start[1], 0)]) visited = {start} while queue: row, col, steps = queue.popleft() if (row, col) == end: return steps for dr, dc in directions: next_row, next_col = row + dr, col + dc if (0 <= next_row < rows and 0 <= next_col < cols and grid[next_row][next_col] == 0 and (next_row, next_col) not in visited): visited.add((next_row, next_col)) queue.append((next_row, next_col, steps + 1)) return -1
BFS works here because every step costs 1. The first time it reaches the end cell, it has taken the minimum possible steps. No wall configuration tricks it, because BFS will have visited every cell reachable in fewer steps before it considers this one.
Word Ladder (LeetCode 127) is the same idea. Each word-to-word transformation counts as one step. BFS fans out from the start word across all single-letter mutations and returns the minimum chain length when it hits the target.
When BFS Stops Working
BFS counts edges, not costs. When edges have different weights, the fewest-edges path is not the lowest-cost path. They can diverge completely.
A ---1--- B ---1--- C
\ /
------10---------
BFS says A → B → C (2 edges) is shorter than A → C (1 edge). By edge count: correct. By cost: A → B → C costs 2 and A → C costs 10, so BFS is accidentally right here. Flip those weights and BFS hands you the wrong answer without warning.
For weighted graphs with positive edge weights, use Dijkstra's algorithm. It runs in O((V + E) log V) with a binary heap, replacing BFS's FIFO queue with a min-priority queue that processes nodes by cumulative cost from the source.
For graphs where edge weights are only 0 or 1, there's a cleaner option: 0-1 BFS. Push weight-0 edges to the front of a deque, weight-1 edges to the back. Same O(V + E) as BFS, no heap required.
For negative edge weights, Dijkstra breaks. Use Bellman-Ford, which relaxes all edges V-1 times in O(VE). If the graph is a DAG, topological sort plus dynamic programming handles negative weights in O(V + E) and beats both.
The Algorithm Selection Flowchart
Two questions, in order:
Is the graph weighted?
No: use BFS. Yes: next question.
Are there negative edge weights?
No: use Dijkstra. Yes: use Bellman-Ford. If it's a DAG: topological sort with DP.
One special case worth knowing: DAGs can skip both Dijkstra and Bellman-Ford regardless of whether they have negative weights. Relax edges in topological order. It's O(V + E) and it handles everything except cycles (which DAGs by definition don't have).
Interview Patterns That Test This
When a problem asks for "minimum steps," "shortest path," or "minimum transformations" in an unweighted setting, the answer is almost always BFS.
Patterns to recognize:
- Grid traversal with uniform cost per move
- Minimum number of operations to reach a target state (Word Ladder, sliding puzzle)
- Minimum hops in a social network or dependency graph
- Earliest level in a tree where a condition is met
DFS-based problems look different. They ask about reachability, connectivity, number of distinct paths, or properties of subtrees. When you see "minimum" combined with "number of steps," think BFS first.
The tell is the cost model: one step, one unit, every path at the same length is equally expensive. BFS was built for exactly this.
Three Mistakes That Look Fine Until They Don't
Using DFS and getting lucky. On small test cases, DFS might accidentally return the shortest path because the graph is small or structured favorably. Your code passes four test cases and fails on the fifth. The bug isn't in your implementation. The bug is the algorithm choice. This is somehow worse to debug.
Forgetting the visited set in BFS. Without it, BFS loops forever on cyclic graphs. Add nodes to visited when they're enqueued, not when they're dequeued. If you wait until dequeue, the same node gets added to the queue multiple times from multiple neighbors, and your queue inflates until the process runs out of memory. The timing matters.
Applying BFS to a weighted graph. If the problem says "minimum cost" and edges have different costs, BFS gives wrong answers. Read the constraints. Every time. That word "cost" is doing a lot of work.
If your interview prep is heavy on grid BFS and light on talking through tradeoffs out loud, SpaceComplexity runs voice-based mock interviews where explaining why you chose BFS over DFS matters as much as writing the code.
For more on graph algorithm selection, see BFS vs Dijkstra: Why One Edge Weight Changes Everything, Shortest Path Algorithm: BFS, Dijkstra, Bellman-Ford, and When to Use Each, DFS Pattern Recognition: Every Problem Leaves the Same Five Clues, and BFS Pattern Recognition: Every Problem Hides One of Five Signals.