2D Array vs 1D Array: When to Flatten a Matrix and When to Keep It 2D

- 2D array vs 1D array is a memory layout choice: arrays of arrays scatter rows across the heap while a flat array stores everything in one contiguous block
- Cache misses happen at every row boundary in a 2D array; flat storage lets the CPU prefetcher deliver up to 16 integers per miss with no extra cost
- The flat index formula
r * cols + cconverts any cell to an integer node ID, making flat arrays the natural fit for Union-Find on grids - BFS, DFS, and DP tables belong on 2D arrays —
grid[r][c]syntax matches how you reason about neighbors and recurrence variables - Row-major loop order (outer loop over rows) aligns with C, Python, Java, and Go storage; column-major access of row-major storage causes a cache miss on every single step
- Sparse matrices belong in neither structure — use a hash map or CSR format instead
Nobody sits down to model a grid problem and thinks, "let me pause here to consider my memory layout options." You need a matrix. You write a matrix. The tests pass. You move on.
Except you've just made an implicit architectural choice that your interviewer may probe, that production code will eventually punish, and that nobody explained when they taught you arrays in CS101. Understanding the difference takes five minutes. Not knowing it costs you points on a question where you should be collecting free marks.
A Matrix Is Not What You Think It Is
An array is one-dimensional: a contiguous block of memory. Index 0 is physically adjacent to index 1. You get O(1) access to any element.
A matrix is two-dimensional: a grid of rows and columns. In most languages, it's implemented as an array of arrays. Each row is a separate heap allocation, pointed to by the outer array.
# 2D matrix (array of arrays) matrix = [ [1, 2, 3], [4, 5, 6], [7, 8, 9], ] # Same data, flattened to 1D flat = [1, 2, 3, 4, 5, 6, 7, 8, 9]
Nine values. Two completely different memory layouts. One of them is lying to you about how fast it is.
The Part You've Been Skipping
This is where most engineers check out, because "memory layout" sounds like a systems programmer's problem. It isn't.
With a 2D array of arrays, each row is a separate allocation:
outer: [ ptr0 | ptr1 | ptr2 ]
| | |
[1,2,3] [4,5,6] [7,8,9] <- three separate heap chunks
Reading matrix[0][2] then matrix[1][0] means: follow ptr0, grab the value, follow ptr1, grab the next value. That second pointer dereference almost certainly causes a cache miss. The rows may be scattered across entirely different physical memory pages. Your CPU loaded the wrong neighborhood.
With a flat 1D array, all nine values are contiguous:
flat: [ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ]
When the CPU loads flat[0], it pulls a 64-byte cache line, which holds 16 consecutive 32-bit integers. Reading flat[1] through flat[15] costs nothing extra. No extra memory fetches. Just free. Flat arrays deliver up to 16 values per cache miss; arrays of arrays pay a miss at every row boundary.
In practice, this produces the 5-10x difference in tight loops that cache locality benchmarks document. Bjarne Stroustrup measured a linked list traversal running 50-100x slower than the equivalent vector traversal on 500K elements. Same principle. Different shape of the same mistake.

One Expression Gets You Everywhere
Converting a 2D address to a flat index is one arithmetic expression:
# 2D to flat flat_index = row * num_cols + col # Flat back to 2D row = flat_index // num_cols col = flat_index % num_cols
Example: in a 3-column grid, cell (1, 2) maps to flat index 1 * 3 + 2 = 5. Flat index 7 maps to row 7 // 3 = 2, col 7 % 3 = 1.
This formula shows up in Union-Find on grids, external library calls, and any algorithm that needs to pass matrix data to a contiguous buffer. Memorize it. It comes up more than you'd expect.
Pick One. Here's How.
| Scenario | Prefer | Why |
|---|---|---|
| BFS / DFS on a 2D grid | 2D array | grid[r][c] syntax matches how you reason about neighbors |
| DP table (LCS, edit distance, etc.) | 2D array | Row and column dimensions map directly to recurrence variables |
| Union-Find on a grid | Flat 1D | r * cols + c is both the flat index and the DSU node ID |
| Row-by-row matrix processing | Flat 1D | One allocation, full cache line utilization |
| Calling NumPy / BLAS / GPU kernels | Flat 1D | External libraries expect a contiguous row-major buffer |
| Matrix rotation / transpose | Either | Depends on whether you rotate in-place or into a new array |
| Sparse matrix (mostly zeroes) | Neither | Use a hash map or CSR format instead |
Does your algorithm operate in (row, col) coordinates, or does it treat grid cells as interchangeable integer IDs? BFS does the former. Union-Find does the latter. That's the whole question.
Worked Example 1: BFS on a Grid (Use 2D)
Classic island counting. The 2D representation wins here:
from collections import deque def num_islands(grid: list[list[str]]) -> int: rows, cols = len(grid), len(grid[0]) visited = [[False] * cols for _ in range(rows)] count = 0 def bfs(r: int, c: int) -> None: queue = deque([(r, c)]) visited[r][c] = True while queue: row, col = queue.popleft() for dr, dc in [(-1, 0), (1, 0), (0, -1), (0, 1)]: nr, nc = row + dr, col + dc if 0 <= nr < rows and 0 <= nc < cols and not visited[nr][nc] and grid[nr][nc] == '1': visited[nr][nc] = True queue.append((nr, nc)) for r in range(rows): for c in range(cols): if grid[r][c] == '1' and not visited[r][c]: bfs(r, c) count += 1 return count
You think in (row, col) pairs. Your code should too. Flattening here adds translation overhead with zero performance benefit, because BFS accesses scattered cells regardless of storage layout. You'd pay the conversion cost and still get the cache misses.
Worked Example 2: Union-Find on a Grid (Use Flat)
Number of connected components. DSU nodes are integers. Map (r, c) to an integer once and stay there:
def count_components(grid: list[list[int]]) -> int: rows, cols = len(grid), len(grid[0]) parent = list(range(rows * cols)) def find(x: int) -> int: while parent[x] != x: parent[x] = parent[parent[x]] x = parent[x] return x def union(a: int, b: int) -> None: pa, pb = find(a), find(b) if pa != pb: parent[pa] = pb for r in range(rows): for c in range(cols): if grid[r][c] == 1: node = r * cols + c if c + 1 < cols and grid[r][c + 1] == 1: union(node, r * cols + (c + 1)) if r + 1 < rows and grid[r + 1][c] == 1: union(node, (r + 1) * cols + c) return len({find(r * cols + c) for r in range(rows) for c in range(cols) if grid[r][c] == 1})
The flat index r * cols + c serves double duty: cell address and DSU node ID. No translation layer needed. The moment you reach for 2D coordinates here, you're adding a conversion step that the flat layout eliminates for free.
Worked Example 3: 2D DP Table (Use 2D)
Longest Common Subsequence. The recurrence is inherently two-dimensional:
def lcs(s1: str, s2: str) -> int: m, n = len(s1), len(s2) dp = [[0] * (n + 1) for _ in range(m + 1)] for i in range(1, m + 1): for j in range(1, n + 1): if s1[i - 1] == s2[j - 1]: dp[i][j] = dp[i - 1][j - 1] + 1 else: dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]) return dp[m][n]
dp[i][j] is the answer for s1[:i] and s2[:j]. Flattening to dp[i * (n + 1) + j] obscures the recurrence structure and makes it harder to verify during review. Use 2D whenever the row and column dimensions carry distinct semantic meaning. The code is for humans first.
The Numbers
| Property | 2D Array of Arrays | Flat 1D Array |
|---|---|---|
| Space | O(rows × cols) + O(rows) pointers | O(rows × cols) |
| Element access | O(1), two pointer dereferences | O(1), one index calculation |
| Cache behavior | Row boundaries cause misses | Sequential access is free |
| Heap allocations | rows + 1 | 1 |
| Initialization | Multiple allocations | Single [0] * (rows * cols) |
| Passing to external libraries | Requires copy to contiguous buffer | Ready to pass as-is |
Space complexity is the same for dense matrices. The performance difference lives in constant factors: cache misses, allocation count, pointer dereference overhead. At interview scale, both are fast enough. At production scale on tight inner loops, flat wins.
The Invisible Performance Bug
For flat 1D storage, iteration order matters. And it will fail silently.
Row-major order (C, Python, Java, Go) stores element (r, c) at index r * cols + c. Iterating row by row walks memory forward. The CPU prefetcher loves this. It predicts the next line before you ask for it.
Column-major access of row-major storage iterates column by column. Each step jumps cols elements in memory. For large matrices, every single access is a separate cache miss.
# Fast: row-major access matches row-major storage for r in range(rows): for c in range(cols): process(grid[r][c]) # Slow: column-major access of row-major storage for c in range(cols): for r in range(rows): process(grid[r][c]) # jumps between rows on every step
These two loops are logically identical. They produce the same output. One can run 5-10x slower on matrices that don't fit in cache. Your tests pass on both. Your linter is fine with both. The only way to find this bug is to either know about row-major storage, or to run a profiler on a large matrix and watch your cache miss rate spike. You can write a correct, tested, code-reviewed function that quietly performs 5x worse than it should, and nobody will notice until it matters.
This shows up as a real performance bug in matrix multiplication. The naive triple-nested loop runs fast when the innermost loop walks a row. Transposing the loop order can produce a 5-10x slowdown on matrices that don't fit in cache.
You can read more about why in cache-friendly code patterns and memory locality.
Your Language's Opinion
Python: The gap between list-of-lists and flat list is small at interview scale. Use 2D for readability. NumPy uses flat storage internally, which is why vectorized operations are fast and Python loops over NumPy arrays are not. The entire "numpy is fast, pure Python is slow" story is mostly this one principle.
Java: int[][] is an array of references to int[] objects. Each row is a separate heap allocation. int[] with manual indexing gives a single contiguous block. For competitive programming on large inputs, flat arrays in tight loops show measurable improvement. The JVM doesn't save you here.
C/C++: int matrix[M][N] is a flat contiguous block at the language level. int** matrix is pointer-of-pointers. The former is always cache-friendly. Worth knowing if your interviewer asks you to reason about memory, which in a C++ interview they might.
TypeScript/JavaScript: number[][] is fine. The JIT handles the constant-factor optimization. You have bigger problems than cache line utilization in JavaScript.
The Five-Second Decision
- Algorithm uses
(row, col)coordinates and navigates to neighbors? Use 2D. - Need integer node IDs (Union-Find, graph adjacency, external call)? Use flat.
- Filling a recurrence table where rows and columns mean different things? Use 2D.
- Processing the same data millions of times in a tight loop? Flat wins on cache.
- Sparse matrix? Skip both and use a hash map.
For most coding interview problems, reach for 2D. It matches how you and your interviewer think about grids. Reserve flat indexing for Union-Find on grids and problems where the cell-as-integer-ID framing fits better than cell-as-coordinate.
In a mock interview setting, the choice of representation is exactly the kind of decision that gets noted in the write-up. Pick the right one and explain why. It signals that you think about memory, not just algorithms. That distinction is what separates a signal-rich answer from one that just happens to compile.
Quick Reference
matrix[r][c]in a 2D array: two pointer dereferences, possible row-boundary cache missflat[r * cols + c]in a flat array: one index calculation, sequential access- DSU on grids: always flatten with
r * cols + cas the node ID - DP tables: keep 2D unless you're optimizing space with rolling arrays (see dynamic programming framework)
- Loop row by row for cache efficiency when storage is row-major
- Dense matrix: O(rows × cols) either way. Sparse matrix: use a hash map.