Breadth-First Search: Level by Level, O(V+E), and One Classic Bug

Shortest path in an unweighted graph? BFS. Level-order traversal of a binary tree? BFS. "Minimum moves to reach the exit?" BFS every time. The breadth-first search algorithm shows up in roughly a quarter of all graph interview questions, and it carries one non-obvious bug that most engineers write at least once before understanding why it fails. Usually right before an onsite. But here you are.

What follows is the complete picture: the mechanics, the actual complexity argument (not hand-waving), every language implementation, and the patterns that tell you when to reach for it. Read it once and stop second-guessing yourself at 11pm.

The CLRS book covers BFS on page 594. We cover it here, with fewer proofs and more working code. (via r/ProgrammerHumor)

Think in Rings

Drop a stone in a still pond. Ripples expand outward in concentric rings. BFS works exactly that way. It explores a graph in rings from the starting node: all nodes one hop away first, then all nodes two hops away, and so on. The key invariant is that BFS always visits every node at distance d before it visits any node at distance d+1.

The data structure enforcing this is a FIFO queue. You push the start node, then loop: pop a node, push all its unvisited neighbors. Because FIFO respects insertion order, every node at the current ring empties out before any node from the next ring is processed. The queue is the ring detector.

That is the whole algorithm. Everything else is just explaining why it works and what breaks when you get it wrong.

How It Works Internally

The Queue Mechanics

Queue state for a BFS from node A:

Graph:
A -- B -- D
|         |
C ------- E

Adjacency list:
A: [B, C]
B: [A, D]
C: [A, E]
D: [B, E]
E: [C, D]

Start: enqueue A, mark visited.
Queue: [A]        visited: {A}

Dequeue A. Neighbors: B, C. Both unvisited, mark and enqueue.
Queue: [B, C]     visited: {A, B, C}

Dequeue B. Neighbors: A (visited), D (not visited). Enqueue D.
Queue: [C, D]     visited: {A, B, C, D}

Dequeue C. Neighbors: A (visited), E (not visited). Enqueue E.
Queue: [D, E]     visited: {A, B, C, D, E}

Dequeue D. Neighbors: B (visited), E (visited). Nothing to enqueue.
Queue: [E]

Dequeue E. Neighbors: C (visited), D (visited). Nothing to enqueue.
Queue: []

Visit order: A, B, C, D, E. Distances from A: B=1, C=1, D=2, E=2.

Notice the structure: the queue at any moment contains the "frontier" between explored and unexplored territory. Nodes at level 1 (B, C) are both in the queue before any level-2 node (D, E) enters.

BFS visits A (dist=0), then B and C (dist=1), then D and E (dist=2). The level rings never mix.

The queue holds the current frontier. Notice B and C both appear before D enters, that is the invariant that guarantees shortest paths.

The Bug You Will Write Once

Mark nodes visited when you enqueue them, not when you dequeue them. This is the most common BFS bug and it quietly destroys correctness and performance.

You are going to want to write the version below. Don't. But you will, so here it is, so you can see exactly what goes wrong.

Wrong version:

def bfs_wrong(graph, start):
    queue = deque([start])
    visited = set()
    while queue:
        node = queue.popleft()
        visited.add(node)          # too late
        for neighbor in graph[node]:
            if neighbor not in visited:
                queue.append(neighbor)

Consider what happens when nodes B and C both have D as a neighbor. B is dequeued first, D is not yet visited, so D is enqueued. Then C is dequeued, D is still not in visited (it has not been dequeued yet), so D is enqueued a second time. On a dense graph with many shared neighbors, the same node can be enqueued dozens of times. Your O(V+E) guarantee evaporates. Level counts are wrong. Shortest distances are wrong.

The fix:

def bfs(graph, start):
    visited = {start}              # mark immediately on enqueue
    queue = deque([start])
    while queue:
        node = queue.popleft()
        for neighbor in graph[node]:
            if neighbor not in visited:
                visited.add(neighbor)   # mark before pushing
                queue.append(neighbor)

This guarantees each node enters the queue at most once.

Left: marking on dequeue lets both B and C enqueue D. Right: marking on enqueue means C sees D already visited and ignores it. One line of difference, completely different behavior.

How to Track Levels

For many problems (level-order traversal, finding nodes at depth k, computing distances), you need to know which level each node belongs to.

Two clean approaches:

Snapshot the queue size at the start of each level:

from collections import deque

def bfs_by_level(graph, start):
    visited = {start}
    queue = deque([start])
    levels = []
    while queue:
        level_size = len(queue)     # everything currently in queue is at this level
        current_level = []
        for _ in range(level_size):
            node = queue.popleft()
            current_level.append(node)
            for neighbor in graph[node]:
                if neighbor not in visited:
                    visited.add(neighbor)
                    queue.append(neighbor)
        levels.append(current_level)
    return levels

Store distance alongside the node:

queue = deque([(start, 0)])
visited = {start}
while queue:
    node, dist = queue.popleft()
    for neighbor in graph[node]:
        if neighbor not in visited:
            visited.add(neighbor)
            queue.append((neighbor, dist + 1))

The first approach is clean for tree problems. The second generalizes naturally to graphs where you want explicit distances.

When the Graph Isn't Connected

A single BFS from one source only reaches nodes in the same connected component. To traverse the entire graph, you need an outer loop:

def bfs_all_components(graph):
    visited = set()
    components = []
    for start in graph:
        if start not in visited:
            component = []
            queue = deque([start])
            visited.add(start)
            while queue:
                node = queue.popleft()
                component.append(node)
                for neighbor in graph[node]:
                    if neighbor not in visited:
                        visited.add(neighbor)
                        queue.append(neighbor)
            components.append(component)
    return components

Each outer loop iteration starts a new BFS in an unexplored component. The total work across all BFS runs is still O(V+E) because each vertex and each edge is touched exactly once globally.

Core Operations

The complexity of BFS assumes an adjacency list representation. With an adjacency matrix, examining all neighbors of a vertex takes O(V) time rather than O(degree), which changes the analysis.

Operation	Time (adj. list)	Time (adj. matrix)	Space
BFS traversal	O(V + E)	O(V²)	O(V)
Single-source shortest path (unweighted)	O(V + E)	O(V²)	O(V)
Multi-source shortest path	O(V + E)	O(V²)	O(V)
Connected components	O(V + E)	O(V²)	O(V)
Level-order traversal (tree, n nodes)	O(n)	n/a	O(n)

Why O(V + E), Exactly

This is an aggregate analysis, not amortization. You do not need to invoke the potential method or any accounting trick.

The outer while loop runs once per vertex. Because of the mark-before-enqueue discipline, each vertex enters the queue exactly once, so it also exits exactly once. That accounts for O(V) dequeue operations.

Inside the loop, you iterate over all neighbors of each vertex. Across the entire traversal, you iterate over the neighbor list of every vertex exactly once. In an adjacency list, the sum of all neighbor list lengths is 2|E| for undirected graphs (each edge appears once from each endpoint) and |E| for directed graphs. Either way, total edge-scanning work is O(E).

Total: O(V) + O(E) = O(V + E).

The apparent nested-loop structure (while over queue, for over neighbors) does not multiply to V*E because the inner for loop is bounded by the degree of each specific node, not V for every node.

With adjacency matrix: examining all neighbors of a single vertex requires scanning one full row of size V, so the inner loop is always O(V) per vertex. Total: O(V) * O(V) = O(V²).

Why Space is O(V) but Peaks at the Widest Level

The visited set holds at most V nodes. The queue holds at most the entire frontier at any moment. In the worst case, that is the widest level of the graph.

For a complete binary tree with n nodes, the last level holds floor(n/2) nodes. At the moment BFS is processing the second-to-last level, the entire last level (all ~n/2 nodes) is in the queue simultaneously. So BFS on a wide tree uses roughly n/2 queue slots at peak, which is O(n) space. A tree with a million nodes will have half a million in your queue at once. Your laptop will let you know it noticed.

Contrast this with DFS on the same tree. DFS holds one path from root to current node on the call stack. For a balanced binary tree with height O(log n), DFS uses only O(log n) memory.

The tradeoff is real: BFS uses more memory on wide graphs. DFS uses more memory on deep graphs. For a path graph (1→2→3→…→n), BFS's queue never holds more than 2 nodes at once.

BFS pays in breadth, DFS pays in depth. Neither is universally better, it depends entirely on your graph's shape.

Why BFS Finds the Shortest Path

For unweighted graphs where every edge has cost 1, BFS finds the shortest path from source to every reachable node. The argument is by induction on distance.

Base case: The source node is at distance 0 from itself. BFS visits it first. Correct.

Inductive step: Assume BFS correctly assigns distance d to all nodes at distance d from the source. Those nodes are all in the queue before any node at distance d+1 (from the mark-before-enqueue discipline and FIFO ordering). When BFS dequeues a level-d node and examines its neighbors, any unvisited neighbor must be at distance d+1. That neighbor is marked and enqueued with one more step. No shorter path could exist, because any path shorter than d+1 steps would pass through a node at distance less than d, which is already visited.

Critical limitation: This argument breaks entirely for weighted graphs. If edge weights differ, a path with more hops might cost less than one with fewer hops. For weighted graphs, use Dijkstra's algorithm (which is essentially BFS with a priority queue instead of a plain queue).

One Structure, Every Language

All implementations follow the same contract: accept an adjacency list representation and a start node, return nodes in BFS order. The adjacency list maps each node to its list of neighbors.

from collections import deque

def bfs(graph: dict, start) -> list:
    visited = {start}
    queue = deque([start])
    order = []
    while queue:
        node = queue.popleft()
        order.append(node)
        for neighbor in graph.get(node, []):
            if neighbor not in visited:
                visited.add(neighbor)
                queue.append(neighbor)
    return order

# Example
graph = {
    'A': ['B', 'C'],
    'B': ['A', 'D'],
    'C': ['A', 'E'],
    'D': ['B'],
    'E': ['C'],
}
print(bfs(graph, 'A'))  # ['A', 'B', 'C', 'D', 'E']

Use collections.deque, not a plain list. list.pop(0) shifts every element left in O(n), turning your O(V+E) BFS into O(V² + E). On a sparse graph, that is a full factor of V lost on dequeues alone. deque.popleft() is O(1). This one import is the difference between passing and TLE on a big test case.

What Problems Breadth-First Search Solves

BFS applies whenever you need to:

Find the shortest path in an unweighted graph or grid. Every edge has cost 1, so the first time you reach a node it is via the shortest route.
Enumerate nodes by level or distance. Level-order traversal of trees, finding all nodes exactly k hops from a source, computing diameter.
Flood fill. Painting all connected cells in a grid the same color, finding connected regions, identifying islands.
Check bipartiteness. Assign alternating colors to nodes level by level; a conflict means an odd cycle, which means the graph is not bipartite.
Find the minimum number of state transitions. Word ladder, sliding puzzle, lock combinations. Each state is a node, each valid transition is an edge.
Multi-source shortest distance. Find the nearest "thing" (nearest exit in a maze, nearest infected node, nearest empty cell) when there are multiple sources.

BFS does not apply when edges have different weights (use Dijkstra), when you need to detect cycles or explore full paths (use DFS), or when memory is severely constrained and the graph is very wide (DFS's stack depth is proportional to the longest path, not the widest level).

When to Reach for BFS

Two signals practically guarantee BFS:

The problem asks for minimum steps, minimum moves, or shortest path and does not assign different costs to different edges.
The problem requires processing nodes in order of their distance from a source, or layer by layer.

Seeing "minimum" alone is not enough. If edges have different weights, that is Dijkstra or Bellman-Ford. BFS is specifically for the uniform-cost case. Reaching for Dijkstra when BFS would suffice is fine and slightly over-engineered. Reaching for BFS when you need Dijkstra will produce wrong answers on weighted graphs.

Worked Example 1: Shortest Path in a Binary Matrix

Problem statement: You have an n×m grid of 0s and 1s. You can move in 8 directions. Starting at (0,0) and ending at (n-1,m-1), find the length of the shortest clear path (cells with value 0). Return -1 if no path exists.

Why BFS fits: Every move costs exactly 1. The destination is a single target. You want the minimum number of moves. This is a textbook BFS-on-a-grid problem.

How to model it: Each cell is a node. Two nodes are connected by an edge if they are adjacent (8-directionally) and both have value 0. The graph is implicit, you do not build it; you generate neighbors on the fly during BFS.

from collections import deque

def shortest_path(grid):
    n, m = len(grid), len(grid[0])
    if grid[0][0] == 1 or grid[n-1][m-1] == 1:
        return -1

    queue = deque([(0, 0, 1)])   # (row, col, path_length)
    visited = {(0, 0)}

    while queue:
        row, col, length = queue.popleft()
        if row == n - 1 and col == m - 1:
            return length
        for dr in [-1, 0, 1]:
            for dc in [-1, 0, 1]:
                if dr == 0 and dc == 0:
                    continue
                nr, nc = row + dr, col + dc
                if 0 <= nr < n and 0 <= nc < m and grid[nr][nc] == 0 and (nr, nc) not in visited:
                    visited.add((nr, nc))
                    queue.append((nr, nc, length + 1))
    return -1

The visited set marks cells before enqueuing them. The first time you reach (n-1, m-1), length is guaranteed to be the shortest path. You do not need to keep exploring.

Worked Example 2: Word Ladder

Problem statement: Given two words begin and end and a word list, find the length of the shortest transformation sequence from begin to end where each transformation changes exactly one letter and every intermediate word must be in the word list.

Why BFS fits: Each transformation has equal cost (one letter change). You want the minimum number of transformations. This is a shortest-path problem on an implicit graph of words.

How to model it: Each word is a node. Two words are connected if they differ by exactly one character. You do not build the full adjacency list; instead, generate neighbors on the fly by replacing each character position with every letter of the alphabet and checking the word set.

from collections import deque

def word_ladder(begin, end, word_list):
    word_set = set(word_list)
    if end not in word_set:
        return 0

    queue = deque([(begin, 1)])
    visited = {begin}

    while queue:
        word, steps = queue.popleft()
        for i in range(len(word)):
            for char in 'abcdefghijklmnopqrstuvwxyz':
                candidate = word[:i] + char + word[i+1:]
                if candidate == end:
                    return steps + 1
                if candidate in word_set and candidate not in visited:
                    visited.add(candidate)
                    queue.append((candidate, steps + 1))
    return 0

This generates O(L * 26) candidates per word (L = word length) and checks membership in a hash set in O(1). Total complexity: O(V * L * 26) where V is the number of words.

The visited-before-enqueue rule is load-bearing here. Without it, you could enqueue the same word hundreds of times through different transformation paths, and the result would be wrong.

A Useful Variant: Multi-Source BFS

Sometimes you have multiple sources and want the distance from each node to the nearest source. The naive approach runs BFS once per source: O(S * (V + E)) total.

The better approach: seed all sources into the queue at distance 0 and run a single BFS. Every node is discovered via the shortest path to any source, because BFS expands level by level from all sources simultaneously.

def multi_source_bfs(graph, sources):
    visited = set(sources)
    queue = deque((s, 0) for s in sources)
    distances = {}

    while queue:
        node, dist = queue.popleft()
        distances[node] = dist
        for neighbor in graph.get(node, []):
            if neighbor not in visited:
                visited.add(neighbor)
                queue.append((neighbor, dist + 1))
    return distances

This pattern solves problems like "01 Matrix" (distance of each cell to the nearest 0) and "Walls and Gates" (distance of each empty room to the nearest gate) in a single pass.

Recap

BFS explores a graph level by level using a FIFO queue, visiting all nodes at distance d before any at distance d+1.
Mark nodes visited before enqueuing them, not after dequeuing. Marking after dequeue allows the same node into the queue multiple times and destroys correctness.
Time complexity is O(V+E) with an adjacency list: each vertex is enqueued once (O(V)) and each edge is examined once across the whole traversal (O(E)).
With an adjacency matrix, BFS is O(V²) because checking each node's neighbors requires scanning a full row.
Space is O(V) in the worst case, but peaks at the widest level of the graph. A complete binary tree holds ~n/2 nodes in the queue at the last level; DFS on the same tree holds only O(log n) frames.
BFS guarantees the shortest path in unweighted graphs (all edges cost 1) by induction: the first time any node is discovered, it is via the shortest route.
For weighted graphs, use Dijkstra. For full-path exploration or cycle detection, use DFS.
Multi-source BFS seeds all sources at distance 0 and runs a single traversal, giving each node its distance to the nearest source in O(V+E) total.
Use BFS when the problem asks for minimum steps/hops, nearest node, or level-by-level processing. The word "minimum" combined with equal-cost edges is the clearest signal.

Nailing BFS in an interview is about more than getting the code right. It is explaining your reasoning, catching the mark-before-enqueue discipline without prompting, and choosing between BFS and DFS with a clear justification. If you want practice stating that reasoning out loud under realistic interview pressure, SpaceComplexity runs voice-based DSA mock interviews with rubric-based feedback on exactly that kind of decision-making.

For more interview pattern work, the posts on recognizing when to use dynamic programming and why nested loops cost you the offer cover similar pattern-recognition thinking for other algorithm families.