Graph Data Structure Interview: Adjacency Lists, BFS, DFS, and Thinking in Graphs

- Adjacency list is O(V+E) space and the right default for interview problems; reach for an adjacency matrix only on dense graphs or when you need O(1) edge lookup.
- BFS explores level by level with a queue, finds shortest paths in unweighted graphs, and can hold O(V) nodes at once on wide graphs.
- DFS uses the call stack or an explicit stack, enables cycle detection and topological sort, and will blow Python's 1,000-frame recursion limit on deep path graphs.
- Both traversals run O(V+E) time; the difference is memory shape, not asymptotic complexity.
- Recognizing graph problems in disguise (grids, dependency chains, state transformations) is the skill that separates candidates who pass from those who stall.
- Mark nodes visited at enqueue time, not dequeue time, to prevent the same node from being enqueued multiple times.
- In Python, switch to iterative DFS with an explicit stack for any graph that could have more than a thousand nodes.
You have a set of cities and roads between them. Or a set of courses and prerequisites. Or a grid of cells where some are land and some are water. These are completely different problems. They're all graphs.
The graph is the most flexible data structure in your interview toolkit: anything that can be described as "things and relationships between things" is a graph. Once you can recognize when a problem is secretly a graph, you already know which algorithm to reach for. The hard part is seeing it.
This deep dive covers how graphs are stored in memory, what BFS and DFS actually do to that memory (not just "queue" and "stack"), when each representation wins, and how to spot graph problems in disguise.
Two Ways to Store a Graph
A graph is a set of vertices V and edges E. You need to store that somewhere. You have two main choices.
The Adjacency Matrix
An adjacency matrix is a V×V grid. If there's an edge from vertex i to vertex j, you set matrix[i][j] = 1 (or the edge weight). That's it.
Vertices: 0, 1, 2, 3
Edges: 0-1, 0-2, 1-3
0 1 2 3
0 [0, 1, 1, 0]
1 [1, 0, 0, 1]
2 [1, 0, 0, 0]
3 [0, 1, 0, 0]
Edge lookup is O(1). "Is there a road from city 2 to city 7?" Just check matrix[2][7]. Done.
The cost: you always allocate V² space. For a graph with 10,000 vertices, that's 100 million cells. Whether you have 10 edges or 10 billion, you pay the same. It's like booking a 10,000-seat venue every time you call a meeting. The venue has no opinions about your attendance.
The Adjacency List
An adjacency list stores, for each vertex, the list of its neighbors.
0 -> [1, 2]
1 -> [0, 3]
2 -> [0]
3 -> [1]
Space is O(V + E). For undirected graphs, each edge appears twice (once in each direction), so the total neighbor-list entries is 2E. You only allocate what you actually use.
Edge lookup is O(degree(v)), meaning you scan the neighbor list of vertex v. For sparse graphs where each vertex has a handful of neighbors, this is fast in practice. Worst case for a star graph (one hub connected to every other vertex), the hub has degree V-1, making lookup O(V).
Which One to Use
The deciding factor is density: how many edges do you have relative to V²?
| Condition | Use |
|---|---|
| Dense graph (E close to V²) | Adjacency matrix |
| Sparse graph (E much smaller than V²) | Adjacency list |
| Need O(1) edge lookup frequently | Adjacency matrix |
| Need to iterate over neighbors efficiently | Adjacency list |
| Unknown or varying graph size | Adjacency list |
Most real-world graphs are sparse. Social networks, road maps, dependency graphs, web link graphs: they all have far fewer edges than V². The default for almost every interview problem is the adjacency list.
What Actually Lives in RAM
Adjacency Matrix Layout
The matrix is a flat 2D array (or array of arrays). All cells live in contiguous memory. When you scan a row (iterating all neighbors of a vertex), you get beautiful cache locality: the CPU prefetches the next cell before you ask for it.
The problem is the wasted space. A matrix representing a graph with 1,000 vertices and 2,000 edges still allocates one million cells. 998,000 of them are zero.
Memory:
[Row 0: 0,1,1,0,...][Row 1: 1,0,0,1,...][Row 2: ...]
contiguous, cache-friendly, but mostly empty for sparse graphs
16 cells for 4 vertices. Scale to 10,000 vertices and you're allocating 100 million cells before you've processed a single edge.
Adjacency List Layout
In most language implementations, an adjacency list is an array (or hash map) where each entry points to a dynamic array (or linked list) of neighbors.
heads array: [ptr0] [ptr1] [ptr2] [ptr3]
| | | |
[1,2] [0,3] [0] [1]
Each neighbor list may live at a different heap location. When you traverse from one vertex to its neighbors, then to those neighbors' lists, you're pointer-chasing across memory. This is cache-unfriendly compared to the matrix scan. For very hot traversal code (Dijkstra on a dense graph), this pointer-chasing cost shows up in profiles.
In practice, the memory savings of the adjacency list win decisively for sparse graphs. The extra cache misses are a small constant against the savings from not allocating V² space.
Head pointers live in one array. The neighbor lists are scattered across the heap. Cache misses are real but cheap compared to not allocating V² cells.
Operation Costs and Why They Hold
Complexity Table
| Operation | Adjacency List | Adjacency Matrix |
|---|---|---|
| Space | O(V + E) | O(V²) |
| Add edge | O(1) | O(1) |
| Remove edge | O(degree(v)) | O(1) |
| Edge lookup (u, v) | O(degree(u)) | O(1) |
| Iterate all neighbors of v | O(degree(v)) | O(V) |
| Add vertex | O(1) | O(V²) |
| Remove vertex | O(V + E) | O(V²) |
Why the Bounds Hold
Space: The adjacency list stores one list entry per edge endpoint, so 2E entries total for an undirected graph. Plus V entries for the heads array. Hence O(V + E). The matrix is always V rows by V columns, so O(V²) regardless of E.
Edge lookup in the list: To check if vertex u has an edge to v, you scan u's neighbor list until you find v or exhaust it. That list has exactly degree(u) entries. Average degree for a graph with E edges is 2E/V, so the average case is O(E/V).
Iterating neighbors: The list gives you exactly degree(v) entries to read, so O(degree(v)). The matrix must scan the entire row of V cells to find non-zero entries, so O(V) regardless of actual degree.
Add vertex to matrix: You need a (V+1)×(V+1) matrix. You have to allocate new memory and copy the old data. That copy touches all V² existing cells.
BFS vs DFS: What They Actually Do to Memory
Both traversals visit every vertex once and process every edge once: time is O(V + E) for both. The difference is in which memory they use and how much of it.
BFS: The Queue
BFS explores level by level. Put the start node in a queue. Dequeue a node, enqueue all unvisited neighbors, repeat.
Graph: BFS from 0:
0 -- 1 Level 0: [0]
| | Level 1: [1, 2]
2 -- 3 Level 2: [3]
The queue holds the current "frontier": all nodes you know about but haven't processed yet. The queue size is bounded by the maximum number of nodes at any one level. For a perfect binary tree, the last level has n/2 nodes, so the queue hits O(n) at that level. For a star graph (one hub, V-1 leaves), BFS from the hub enqueues all V-1 leaves at once.
Worst-case space for BFS: O(V). You might hold nearly every vertex in the queue simultaneously.
BFS explores level by level. The queue holds the entire frontier at each step.
DFS: The Stack (or Call Stack)
DFS dives as deep as it can before backtracking. In the recursive implementation, the call stack is the stack. In iterative form, you use an explicit stack.
Graph (path): 0 -- 1 -- 2 -- 3 -- 4
DFS from 0 recurses: explore(0) -> explore(1) -> explore(2) -> explore(3) -> explore(4)
Call stack depth: 5 frames
The call stack depth equals the depth of the DFS tree, which in the worst case (a path graph) equals V. For a path graph with 10,000 vertices, recursive DFS will maintain 10,000 stack frames. In Python, the default recursion limit is 1,000. You'll hit RecursionError before you finish traversing a moderately long chain. Python will not explain itself. It will just stop.
This is why you switch to iterative DFS with an explicit stack for large inputs. You can also raise the limit with sys.setrecursionlimit(), which feels like the right answer until you wonder why you're arguing with the runtime about how deep you're allowed to go.
Worst-case space for DFS: O(V). But the memory shape is different from BFS: DFS uses a deep narrow stack (one path at a time), while BFS uses a wide shallow queue (one level at a time).
Each recursive call adds a frame. On a path graph, every node is someone else's only neighbor, so the stack reaches depth V.
When BFS Memory Blows Up vs. When DFS Memory Blows Up
- Wide, shallow graph (like a social network hub): BFS queues the entire first level. DFS recurses only one step deep. DFS wins.
- Narrow, deep graph (like a path or a deep tree): DFS recurses to depth V. BFS holds at most a few nodes at each level. BFS wins.
- Complete graph: Both are O(V). BFS queues all neighbors of the start node; DFS recurses V deep in the worst case.
Neither algorithm is universally better on memory. The graph's shape determines which hurts more.
Wide graph kills BFS. Deep graph kills DFS. Know your graph before you pick your traversal.
Implementations
from collections import defaultdict, deque class Graph: def __init__(self): self.adjacency = defaultdict(list) def add_edge(self, u, v): self.adjacency[u].append(v) self.adjacency[v].append(u) def bfs(self, start): visited = {start} queue = deque([start]) order = [] while queue: node = queue.popleft() order.append(node) for neighbor in self.adjacency[node]: if neighbor not in visited: visited.add(neighbor) queue.append(neighbor) return order def dfs(self, start): visited = set() order = [] def explore(node): visited.add(node) order.append(node) for neighbor in self.adjacency[node]: if neighbor not in visited: explore(neighbor) explore(start) return order
Use collections.deque for BFS. A regular list's pop(0) is O(n) because it shifts every element; deque.popleft() is O(1).
What Graphs Are Actually Good For
Connectivity. Is vertex A reachable from vertex B? How many connected components exist? A single DFS or BFS answers these in O(V + E).
Shortest paths in unweighted graphs. BFS finds the shortest path by level: the first time you reach a node, you've reached it in the fewest steps. Add weights and you need Dijkstra (non-negative weights) or Bellman-Ford (negative weights allowed).
Cycle detection. In an undirected graph, DFS finds a cycle when it encounters a visited node that isn't the current node's parent. In a directed graph, you look for back edges: edges pointing to a node that's currently on the recursion stack.
Topological ordering. For a directed acyclic graph (DAG), topological sort gives you a linear ordering where all edges go forward. DFS-based topological sort, or Kahn's algorithm (BFS-based), both run O(V + E). This is the mechanism behind dependency resolution.
Flood fill and connected regions. Any grid-based problem about connected regions is a graph problem where each cell is a vertex and edges connect adjacent cells.
Recognizing Graph Problems in Coding Interviews
This is the real skill. Many interview problems don't announce themselves as graph problems.
Signal: "Things with relationships"
If a problem describes entities and connections between them, you're looking at a graph. The signals:
- "Adjacent" cells in a grid
- "Prerequisite" or "depends on" relationships between tasks
- "Can transform into" relationships between states
- "Connected" or "reachable" in any sense
- Anything involving paths, routes, or sequences of steps
Worked Example 1: Number of Islands
Problem: Given a 2D grid of '1's (land) and '0's (water), count the number of islands.
Why it's a graph: Each cell is a vertex. An edge exists between two cells if they're adjacent (up, down, left, right) and both are land. An island is a connected component.
Solution: Iterate over every cell. When you find an unvisited '1', run BFS or DFS from it, marking all reachable land cells as visited. Increment the island count by one. Each BFS/DFS call visits one complete connected component.
def num_islands(grid): if not grid: return 0 rows, cols = len(grid), len(grid[0]) visited = set() count = 0 def bfs(row, col): queue = deque([(row, col)]) visited.add((row, col)) while queue: r, c = queue.popleft() for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]: nr, nc = r + dr, c + dc if 0 <= nr < rows and 0 <= nc < cols and (nr, nc) not in visited and grid[nr][nc] == '1': visited.add((nr, nc)) queue.append((nr, nc)) for r in range(rows): for c in range(cols): if grid[r][c] == '1' and (r, c) not in visited: bfs(r, c) count += 1 return count
The grid is never explicitly converted into an adjacency list. The graph is implicit: the for dr, dc loop is just "iterate neighbors."
Worked Example 2: Course Schedule (Cycle Detection in a Directed Graph)
Problem: There are n courses (0 to n-1) and a list of prerequisites [a, b] meaning "take b before a." Can you finish all courses?
Why it's a graph: Each course is a vertex. Each prerequisite pair [a, b] is a directed edge from b to a. You can finish all courses if and only if the directed graph has no cycle (circular dependencies make it impossible).
Solution: Build a directed adjacency list. Run DFS. Maintain a "currently on the recursion stack" set in addition to the "visited" set. If DFS reaches a node that's currently on the stack, there's a cycle.
def can_finish(num_courses, prerequisites): adjacency = defaultdict(list) for course, prereq in prerequisites: adjacency[prereq].append(course) visiting = set() visited = set() def has_cycle(node): if node in visiting: return True if node in visited: return False visiting.add(node) for neighbor in adjacency[node]: if has_cycle(neighbor): return True visiting.remove(node) visited.add(node) return False for course in range(num_courses): if has_cycle(course): return False return True
The visiting set tracks nodes on the current path. The visited set is a memo: if you've already fully explored a node and found no cycle, don't explore it again. This gives O(V + E) instead of exponential.
More Signals That Suggest a Graph
- "Minimum number of steps/moves": BFS on a state graph where each state is a vertex and each valid move is an edge.
- "Can you reach X from Y": reachability query, DFS or BFS.
- "All paths from X to Y": DFS with backtracking.
- "Ordering with constraints": topological sort.
- "Are these two things in the same group": union-find or connected components.
If the dynamic programming framework is about recognizing overlapping subproblems, the graph framework is about recognizing hidden connectivity. Many DP problems (counting paths in a DAG, shortest path with constraints) are graph problems underneath.
The Non-Obvious Things People Get Wrong
Forgetting to mark visited before enqueuing. If you mark a node visited only when you dequeue it, you can enqueue the same node multiple times before it's processed. For a dense graph, this causes redundant work and can blow up the queue size. Mark visited when you enqueue. This bug passes every small test case and then silently doubles your queue in production.
Using recursive DFS on large inputs in Python. The default recursion limit is 1,000. A path graph with 1,001 nodes will hand you a RecursionError mid-traversal with no further explanation. Use sys.setrecursionlimit() if you need a quick fix, or write iterative DFS with an explicit stack if you want something you can actually defend.
Iterating neighbors of the adjacency matrix. Beginners sometimes build an adjacency list and then check matrix[u][v] for every possible v. That's O(V) per vertex, turning O(V + E) algorithms into O(V²). Use the list if you have it.
Not handling disconnected graphs. BFS and DFS from a single source only visit the connected component containing that source. If you need to visit every vertex, loop over all vertices and start a new traversal for any that haven't been visited. You test with a connected graph. Your interviewer has a disconnected one ready.
Recap
- Adjacency list: O(V + E) space, O(degree(v)) neighbor iteration, default for sparse graphs.
- Adjacency matrix: O(V²) space, O(1) edge lookup, use only when the graph is dense or constant-time edge queries matter.
- BFS uses a queue, explores level by level, finds shortest paths in unweighted graphs, worst-case O(V) queue memory.
- DFS uses the call stack (or explicit stack), explores depth-first, enables cycle detection and topological sort, worst-case O(V) stack depth.
- Both traversals run O(V + E) time.
- The queue memory in BFS balloons on wide graphs; the stack depth in DFS blows up on deep graphs.
- Mark nodes visited at enqueue time, not dequeue time.
- Recursive DFS in Python hits the default 1,000-frame limit; use iterative DFS for large inputs.
- Grid problems, dependency problems, and transformation problems are often graph problems in disguise.
Graph problems show up constantly in technical interviews, and the ones that trip people up are rarely the ones labeled "graph." They're the grid flood-fill, the dependency cycle, the word transformation. Practicing recognition matters as much as practicing the algorithms themselves.
If you want to practice spotting graph problems under interview pressure, SpaceComplexity runs voice-based mock interviews with rubric-based feedback covering exactly this kind of pattern recognition. The goal is to internalize the "this is a graph" instinct so it fires automatically.
For more on interview problem patterns, see Nested Loops Will Cost You the Offer for a similar deep dive into the sliding window pattern, and Ask These Clarifying Questions First for how to handle the modeling step before you write any code.