Adjacency List vs Adjacency Matrix: E vs V² Is the Whole Answer

You're implementing a graph algorithm. First decision isn't the algorithm. It's the container. Pick wrong and BFS silently becomes O(V²) instead of O(V+E). It won't crash. It'll just be slow on large inputs, which you'll discover at the worst possible moment.

The short answer: if E is much smaller than V², use an adjacency list. If E is close to V², use a matrix. For most real-world graphs and almost every interview problem, that means the list.

Two Ways to Answer "Who Are My Neighbors?"

Every graph operation reduces to one question: given a vertex u, which vertices can I reach from it?

An adjacency matrix answers by pre-allocating a V×V grid. mat[u][v] = 1 if the edge (u, v) exists, 0 otherwise. Lookup is one array index. The cost is that you pay for all V² slots regardless of how many edges actually exist.

An adjacency list answers by storing only the edges that are real. Each vertex gets a list (or set) of its actual neighbors. Nothing is allocated for absent edges.

Take a small concrete graph:

Four vertices, four edges. Both representations:

# Adjacency Matrix (4×4)
matrix = [
    [0, 1, 0, 1],  # vertex 0: connects to 1, 3
    [1, 0, 1, 0],  # vertex 1: connects to 0, 2
    [0, 1, 0, 1],  # vertex 2: connects to 1, 3
    [1, 0, 1, 0],  # vertex 3: connects to 0, 2
]

# Adjacency List
adj = {
    0: [1, 3],
    1: [0, 2],
    2: [1, 3],
    3: [0, 2],
}

Same graph. Same information. Now watch what happens as V and E scale.

V² Is Fixed. V + E Grows With Your Edges.

The matrix always allocates V² cells. Exact, not amortized. Every potential edge gets a slot whether it exists or not.

The list allocates one header per vertex (V slots total) plus one entry per directed edge. For undirected graphs, each edge appears twice, once in each endpoint's list. The total count of list entries is exactly 2E.

The Handshaking Lemma makes this precise: the sum of all vertex degrees equals 2E in an undirected graph, because each edge contributes 1 to the degree of both its endpoints. Summing across all adjacency lists therefore counts each edge exactly twice. Total space: V headers + 2E entries = O(V + E).

For directed graphs, each edge appears exactly once (in the source vertex's list), so it's V + E.

Put real numbers to this. A city road network has roughly 10,000 intersections and 20,000 road segments (each intersection connects to about four roads on average):

V = 10,000    E = 20,000

Matrix:  V²      = 100,000,000 cells
List:    V + E   =     30,000 cells

Ratio: 3,333×

Road networks are not particularly sparse, and the ratio is already 3,333×. A social network with a million users averaging twenty friends each has edge density below 0.00002%. The matrix would allocate one trillion cells for two million actual edges. Not stored then ignored. Visited, found to contain zero, and discarded. On every BFS step.

Adjacency matrix 4x4 grid with blue cells for edges and dark cells for absent connections vs adjacency list with only actual neighbor entries The same 4-vertex graph, two ways. The matrix allocates 16 cells for 8 edges. The list allocates 8.

The Operation That Decides Everything

Checking whether a specific edge (u, v) exists is O(1) with a matrix and O(degree(u)) with a list. The matrix wins there, cleanly.

But finding all neighbors of a vertex is the core operation for every traversal algorithm, and the asymmetry flips.

With the matrix, you scan the entire row for u. That is V cells. degree(u) of them are actual neighbors. The rest are zeros you visit specifically to confirm they're zeros:

# Matrix: finding neighbors of u
for v in range(V):
    if matrix[u][v]:        # must check all V cells
        process_neighbor(v)

With the list, you iterate only the real neighbors:

# List: finding neighbors of u
for v in adj[u]:            # only actual neighbors
    process_neighbor(v)

For BFS or DFS on the whole graph, you do this for every vertex. The matrix pays O(V) per vertex, O(V²) total. The list pays O(degree(u)) per vertex, O(V + E) total.

Operation	Adjacency Matrix	Adjacency List
Space	Θ(V²)	Θ(V + E)
Edge check (u, v)	O(1)	O(degree(u))
Find all neighbors of u	O(V)	O(degree(u))
BFS / DFS full graph	O(V²)	O(V + E)
Add edge	O(1)	O(1)
Remove edge	O(1)	O(degree(u))
Add vertex	O(V²)	O(1)

For the road network: matrix BFS does 100 million operations. List BFS does 30,000. This is not a constant factor. It is the difference between an algorithm that finishes and one that times out.

BFS Makes the Gap Concrete

BFS on the 4-vertex cycle starting at vertex 0.

With the matrix, every dequeue triggers a full row scan:

Dequeue 0: scan [0, 1, 0, 1], check 4 cells, find 2 neighbors
Dequeue 1: scan [1, 0, 1, 0], check 4 cells, find 1 new neighbor (2)
Dequeue 3: scan [1, 0, 1, 0], check 4 cells, find 0 new neighbors
Dequeue 2: scan [0, 1, 0, 1], check 4 cells, find 0 new neighbors

Total cell inspections: 16 = V². Every row, every time. Even on a four-vertex graph you're auditing every possible connection between every possible pair.

With the list, each dequeue iterates only real neighbors:

Dequeue 0: visit [1, 3], 2 entries checked
Dequeue 1: visit [0, 2], 2 entries checked
Dequeue 3: visit [0, 2], 2 entries checked
Dequeue 2: visit [1, 3], 2 entries checked

Total entries examined: 8 = 2E. For sparse graphs, 2E is far smaller than V². That is the entire argument.

BFS with adjacency matrix scanning all 4 cells per row vs adjacency list visiting only 2 real neighbors per dequeue Dequeue vertex 0. Matrix: check 4 cells. List: visit 2 neighbors. The gap compounds across every vertex.

When the Matrix Wins

There are exactly two situations where the matrix is the right tool.

Dense graphs. If E is close to V², the list's O(V + E) approaches O(V²) anyway. The space advantage disappears, and the matrix's O(1) edge lookup becomes attractive. A complete graph has V(V-1)/2 edges. A chess king's graph (grid where each cell connects to all 8 neighbors) is relatively dense. When E/V² is near 1, reach for the matrix.

All-pairs shortest paths. Floyd-Warshall is the matrix's moment. Its entire career leads here. The algorithm needs O(1) access to dist[i][k] and dist[k][j] for every triple (i, j, k). Both lookups must be O(1), or the triple-nested loop blows past O(V³). The matrix delivers this naturally. A list can't, not without building a V×V distance table first, which is just the matrix again:

def floyd_warshall(dist):
    # dist[i][j] = edge weight, or float('inf') if no direct edge
    V = len(dist)
    for k in range(V):
        for i in range(V):
            for j in range(V):
                if dist[i][k] + dist[k][j] < dist[i][j]:
                    dist[i][j] = dist[i][k] + dist[k][j]

Every dist[i][k] access is a single array index. See the Floyd-Warshall algorithm deep dive for the full DP derivation.

Weights Don't Change the Argument

Both representations handle edge weights without changing the complexity arguments.

Matrix: store the weight at mat[u][v] instead of 1. Use float('inf') for absent edges (so Floyd-Warshall initialization is trivial) and 0 on the diagonal.

List: store (neighbor, weight) tuples instead of bare neighbor IDs.

# Weighted adjacency list
adj = {
    0: [(1, 4), (3, 7)],
    1: [(0, 4), (2, 2)],
    2: [(1, 2), (3, 6)],
    3: [(0, 7), (2, 6)],
}

# Weighted adjacency matrix (inf = no edge)
INF = float('inf')
matrix = [
    [0,   4,   INF, 7  ],
    [4,   0,   2,   INF],
    [INF, 2,   0,   6  ],
    [7,   INF, 6,   0  ],
]

For Dijkstra, the list is still correct. Push (cost, vertex) pairs onto a heap, iterate only real neighbors at each pop. The matrix version works, technically. It also scans V cells per pop, which is O(V²) total neighbor scanning on a problem that wants O(E log V).

Adjacency List vs Adjacency Matrix: Which Do You Pick?

Decision flowchart showing dense graph or O(1) edge lookup requirement routes to adjacency matrix while sparse graphs route to adjacency list Two questions. Most graphs skip both yes-branches and go straight to the list.

density = E / (V * (V - 1))

If density > 0.1  (roughly):  consider adjacency matrix
If density < 0.1:             use adjacency list

Three questions to answer before you code:

How sparse is the graph? Real-world graphs are almost always sparse. Road maps, social connections, dependency trees, grid mazes: all have E far below V². The list is the default.
Does the algorithm need O(1) arbitrary edge queries? Floyd-Warshall and transitive closure do. BFS, DFS, Dijkstra, Bellman-Ford, and topological sort do not. For the traversal family, the list is correct.
Is V small enough that V² fits in memory? V up to a few hundred, matrix is fine. V in the thousands or millions, O(V²) memory is out of the question.

In interviews, say this before you code. "Sparse graph, so I'll use an adjacency list. If we needed O(1) edge checks between arbitrary pairs I'd switch to a matrix." That sentence gets written down. Most candidates just start typing.

For how BFS and DFS behave on both representations, BFS vs DFS covers the traversal trade-offs in detail. The Graph Data Structure article covers implementation patterns across common interview problems.

What Real Graph Libraries Do

Production graph libraries rarely use a plain list-of-lists. The Compressed Sparse Row (CSR) format packs all neighbor lists into a single flat array with an index marking where each vertex's neighbors start. Same O(V + E) space, same O(degree) traversal, but neighbors sit contiguously in memory. Sequential reads instead of pointer chasing. SciPy's csgraph and most HPC graph frameworks use CSR. For interviews, dict-of-lists is fine. Knowing CSR exists signals you understand why memory layout matters at scale.

The Rules in One Place

Adjacency matrix: V×V grid, Θ(V²) space, O(1) edge check, O(V) neighbor scan. Right for dense graphs, all-pairs algorithms like Floyd-Warshall, or when V is small.
Adjacency list: array of neighbor lists, Θ(V + E) space, O(degree) neighbor scan, O(degree) edge check. Right for sparse graphs and every traversal algorithm.
The crossover: when E is close to V², the two representations converge in cost and the matrix's O(1) edge lookup tips the balance.
Most real-world graphs are sparse. Road networks, social graphs, dependency DAGs, grid mazes: all have E far below V². The list is the default.
In BFS and DFS, the matrix forces you to scan every row, not just actual edges. The resulting O(V²) instead of O(V + E) is not academic; it times out on large inputs.

Explaining this trade-off without being prompted is exactly what separates candidates who know graphs from candidates who can talk through them. SpaceComplexity runs voice-based DSA mock interviews where the first question might literally be "how are you representing this graph, and why." That's the kind of fluency that only builds by saying it out loud.