2D Array vs 1D Array: When to Flatten a Matrix and When to Keep It 2D

Nobody sits down to model a grid problem and thinks, "let me pause here to consider my memory layout options." You need a matrix. You write a matrix. The tests pass. You move on.

Except you've just made an implicit architectural choice that your interviewer may probe, that production code will eventually punish, and that nobody explained when they taught you arrays in CS101. Understanding the difference takes five minutes. Not knowing it costs you points on a question where you should be collecting free marks.

A Matrix Is Not What You Think It Is

An array is one-dimensional: a contiguous block of memory. Index 0 is physically adjacent to index 1. You get O(1) access to any element.

A matrix is two-dimensional: a grid of rows and columns. In most languages, it's implemented as an array of arrays. Each row is a separate heap allocation, pointed to by the outer array.

# 2D matrix (array of arrays)
matrix = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
]

# Same data, flattened to 1D
flat = [1, 2, 3, 4, 5, 6, 7, 8, 9]

Nine values. Two completely different memory layouts. One of them is lying to you about how fast it is.

The Part You've Been Skipping

This is where most engineers check out, because "memory layout" sounds like a systems programmer's problem. It isn't.

With a 2D array of arrays, each row is a separate allocation:

outer: [ ptr0 | ptr1 | ptr2 ]
          |       |       |
        [1,2,3] [4,5,6] [7,8,9]   <- three separate heap chunks

Reading matrix[0][2] then matrix[1][0] means: follow ptr0, grab the value, follow ptr1, grab the next value. That second pointer dereference almost certainly causes a cache miss. The rows may be scattered across entirely different physical memory pages. Your CPU loaded the wrong neighborhood.

With a flat 1D array, all nine values are contiguous:

flat: [ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ]

When the CPU loads flat[0], it pulls a 64-byte cache line, which holds 16 consecutive 32-bit integers. Reading flat[1] through flat[15] costs nothing extra. No extra memory fetches. Just free. Flat arrays deliver up to 16 values per cache miss; arrays of arrays pay a miss at every row boundary.

In practice, this produces the 5-10x difference in tight loops that cache locality benchmarks document. Bjarne Stroustrup measured a linked list traversal running 50-100x slower than the equivalent vector traversal on 500K elements. Same principle. Different shape of the same mistake.

Memory layout comparison: array of arrays scatters rows across heap chunks causing cache misses, while flat 1D array keeps all values contiguous so one cache miss loads 16 values

One Expression Gets You Everywhere

Converting a 2D address to a flat index is one arithmetic expression:

# 2D to flat
flat_index = row * num_cols + col

# Flat back to 2D
row = flat_index // num_cols
col = flat_index % num_cols

Example: in a 3-column grid, cell (1, 2) maps to flat index 1 * 3 + 2 = 5. Flat index 7 maps to row 7 // 3 = 2, col 7 % 3 = 1.

This formula shows up in Union-Find on grids, external library calls, and any algorithm that needs to pass matrix data to a contiguous buffer. Memorize it. It comes up more than you'd expect.

Pick One. Here's How.

Scenario	Prefer	Why
BFS / DFS on a 2D grid	2D array	`grid[r][c]` syntax matches how you reason about neighbors
DP table (LCS, edit distance, etc.)	2D array	Row and column dimensions map directly to recurrence variables
Union-Find on a grid	Flat 1D	`r * cols + c` is both the flat index and the DSU node ID
Row-by-row matrix processing	Flat 1D	One allocation, full cache line utilization
Calling NumPy / BLAS / GPU kernels	Flat 1D	External libraries expect a contiguous row-major buffer
Matrix rotation / transpose	Either	Depends on whether you rotate in-place or into a new array
Sparse matrix (mostly zeroes)	Neither	Use a hash map or CSR format instead

Does your algorithm operate in (row, col) coordinates, or does it treat grid cells as interchangeable integer IDs? BFS does the former. Union-Find does the latter. That's the whole question.

Worked Example 1: BFS on a Grid (Use 2D)

Classic island counting. The 2D representation wins here:

from collections import deque

def num_islands(grid: list[list[str]]) -> int:
    rows, cols = len(grid), len(grid[0])
    visited = [[False] * cols for _ in range(rows)]
    count = 0

    def bfs(r: int, c: int) -> None:
        queue = deque([(r, c)])
        visited[r][c] = True
        while queue:
            row, col = queue.popleft()
            for dr, dc in [(-1, 0), (1, 0), (0, -1), (0, 1)]:
                nr, nc = row + dr, col + dc
                if 0 <= nr < rows and 0 <= nc < cols and not visited[nr][nc] and grid[nr][nc] == '1':
                    visited[nr][nc] = True
                    queue.append((nr, nc))

    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == '1' and not visited[r][c]:
                bfs(r, c)
                count += 1

    return count

You think in (row, col) pairs. Your code should too. Flattening here adds translation overhead with zero performance benefit, because BFS accesses scattered cells regardless of storage layout. You'd pay the conversion cost and still get the cache misses.

Worked Example 2: Union-Find on a Grid (Use Flat)

Number of connected components. DSU nodes are integers. Map (r, c) to an integer once and stay there:

def count_components(grid: list[list[int]]) -> int:
    rows, cols = len(grid), len(grid[0])
    parent = list(range(rows * cols))

    def find(x: int) -> int:
        while parent[x] != x:
            parent[x] = parent[parent[x]]
            x = parent[x]
        return x

    def union(a: int, b: int) -> None:
        pa, pb = find(a), find(b)
        if pa != pb:
            parent[pa] = pb

    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == 1:
                node = r * cols + c
                if c + 1 < cols and grid[r][c + 1] == 1:
                    union(node, r * cols + (c + 1))
                if r + 1 < rows and grid[r + 1][c] == 1:
                    union(node, (r + 1) * cols + c)

    return len({find(r * cols + c) for r in range(rows) for c in range(cols) if grid[r][c] == 1})

The flat index r * cols + c serves double duty: cell address and DSU node ID. No translation layer needed. The moment you reach for 2D coordinates here, you're adding a conversion step that the flat layout eliminates for free.

Worked Example 3: 2D DP Table (Use 2D)

Longest Common Subsequence. The recurrence is inherently two-dimensional:

def lcs(s1: str, s2: str) -> int:
    m, n = len(s1), len(s2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if s1[i - 1] == s2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + 1
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])

    return dp[m][n]

dp[i][j] is the answer for s1[:i] and s2[:j]. Flattening to dp[i * (n + 1) + j] obscures the recurrence structure and makes it harder to verify during review. Use 2D whenever the row and column dimensions carry distinct semantic meaning. The code is for humans first.

The Numbers

Property	2D Array of Arrays	Flat 1D Array
Space	O(rows × cols) + O(rows) pointers	O(rows × cols)
Element access	O(1), two pointer dereferences	O(1), one index calculation
Cache behavior	Row boundaries cause misses	Sequential access is free
Heap allocations	rows + 1	1
Initialization	Multiple allocations	Single `[0] * (rows * cols)`
Passing to external libraries	Requires copy to contiguous buffer	Ready to pass as-is

Space complexity is the same for dense matrices. The performance difference lives in constant factors: cache misses, allocation count, pointer dereference overhead. At interview scale, both are fast enough. At production scale on tight inner loops, flat wins.

The Invisible Performance Bug

For flat 1D storage, iteration order matters. And it will fail silently.

Row-major order (C, Python, Java, Go) stores element (r, c) at index r * cols + c. Iterating row by row walks memory forward. The CPU prefetcher loves this. It predicts the next line before you ask for it.

Column-major access of row-major storage iterates column by column. Each step jumps cols elements in memory. For large matrices, every single access is a separate cache miss.

# Fast: row-major access matches row-major storage
for r in range(rows):
    for c in range(cols):
        process(grid[r][c])

# Slow: column-major access of row-major storage
for c in range(cols):
    for r in range(rows):
        process(grid[r][c])  # jumps between rows on every step

These two loops are logically identical. They produce the same output. One can run 5-10x slower on matrices that don't fit in cache. Your tests pass on both. Your linter is fine with both. The only way to find this bug is to either know about row-major storage, or to run a profiler on a large matrix and watch your cache miss rate spike. You can write a correct, tested, code-reviewed function that quietly performs 5x worse than it should, and nobody will notice until it matters.

This shows up as a real performance bug in matrix multiplication. The naive triple-nested loop runs fast when the innermost loop walks a row. Transposing the loop order can produce a 5-10x slowdown on matrices that don't fit in cache.

You can read more about why in cache-friendly code patterns and memory locality.

Your Language's Opinion

Python: The gap between list-of-lists and flat list is small at interview scale. Use 2D for readability. NumPy uses flat storage internally, which is why vectorized operations are fast and Python loops over NumPy arrays are not. The entire "numpy is fast, pure Python is slow" story is mostly this one principle.

Java: int[][] is an array of references to int[] objects. Each row is a separate heap allocation. int[] with manual indexing gives a single contiguous block. For competitive programming on large inputs, flat arrays in tight loops show measurable improvement. The JVM doesn't save you here.

C/C++: int matrix[M][N] is a flat contiguous block at the language level. int** matrix is pointer-of-pointers. The former is always cache-friendly. Worth knowing if your interviewer asks you to reason about memory, which in a C++ interview they might.

TypeScript/JavaScript: number[][] is fine. The JIT handles the constant-factor optimization. You have bigger problems than cache line utilization in JavaScript.

The Five-Second Decision

Algorithm uses (row, col) coordinates and navigates to neighbors? Use 2D.
Need integer node IDs (Union-Find, graph adjacency, external call)? Use flat.
Filling a recurrence table where rows and columns mean different things? Use 2D.
Processing the same data millions of times in a tight loop? Flat wins on cache.
Sparse matrix? Skip both and use a hash map.

For most coding interview problems, reach for 2D. It matches how you and your interviewer think about grids. Reserve flat indexing for Union-Find on grids and problems where the cell-as-integer-ID framing fits better than cell-as-coordinate.

In a mock interview setting, the choice of representation is exactly the kind of decision that gets noted in the write-up. Pick the right one and explain why. It signals that you think about memory, not just algorithms. That distinction is what separates a signal-rich answer from one that just happens to compile.

Quick Reference

matrix[r][c] in a 2D array: two pointer dereferences, possible row-boundary cache miss
flat[r * cols + c] in a flat array: one index calculation, sequential access
DSU on grids: always flatten with r * cols + c as the node ID
DP tables: keep 2D unless you're optimizing space with rolling arrays (see dynamic programming framework)
Loop row by row for cache efficiency when storage is row-major
Dense matrix: O(rows × cols) either way. Sparse matrix: use a hash map.