Mutable vs Immutable Data Structures: What Changes When You Can't Change

May 28, 202610 min read
dsaalgorithmsinterview-prepdata-structures
Mutable vs Immutable Data Structures: What Changes When You Can't Change
TL;DR
  • Immutability eliminates aliasing bugs: no function can mutate what you passed it, and the most common backtracking bug in interviews is a mutable reference mistake
  • Structural sharing makes persistent updates O(log₃₂ n), not O(n): only the path to the changed node is copied, everything else is shared
  • Hash keys must be immutable: Python's list can't be a dict key; tuple can; Java's String can; StringBuilder can't
  • Mutable wins for single-threaded write-heavy loops: DP tables, sorting, graph traversal, where the 2-3x constant factor matters
  • Immutable wins for shared state, cached return values, and React-style change detection via O(1) reference equality
  • Build-then-freeze is the universal pattern: accumulate with a mutable builder (list, StringBuilder), convert to immutable when sharing

You have a list of strings. You want to build a big string by joining them. The obvious thing is a loop with +=:

result = "" for word in words: result += word

This runs in O(n²). Not because of anything subtle about the algorithm. Because strings in Python are immutable. Every += allocates a new string, copies every character written so far, then throws the old one away. By the thousandth word, you're copying a thousand characters per iteration. Your laptop fan spins up.

That's the mutable vs immutable tradeoff in its most visible form. The more interesting question is why languages make strings immutable in the first place, because once you understand that, "just make everything mutable" looks a lot less appealing.

What Changes When You Assign a Value

A mutable data structure owns a region of memory and changes it in place. When you do arr[2] = 7, the integer 7 is written directly to the memory address that arr[2] points to. One write, done.

An immutable structure doesn't allow that. The memory it occupies is fixed forever. Any operation that looks like a modification actually produces a new object in a new memory location.

Two-panel diagram: left shows a mutable array updated in place; right shows an immutable array producing a new object alongside the original Left: mutable update writes one cell. Right: immutable "update" creates a whole new object. Both versions coexist.

The difference that matters is what happens to other references pointing at the same object. With a mutable structure, every reference sees the change. With an immutable one, the old reference still sees the old value, and only the new reference sees the new object. This split is the source of both immutability's costs and its benefits.

Why Languages Choose Immutability

Aliasing is the first pressure. Mutable objects shared across multiple call sites produce a class of bug that's genuinely hard to trace. Pass a list to a helper function, the helper sorts it in place, and the caller's list is now sorted. Probably you didn't want that. The most common backtracking bug in coding interviews is exactly this: appending a path to a results list, then watching all results become empty because you stored a reference to the same list that gets cleared on every recursive return. Immutability makes this impossible.

Thread safety is another driver. If two threads read and write a mutable object concurrently, you need a lock. Locks are expensive, and getting them wrong produces data races that corrupt state silently. Immutable objects require no synchronization because their state never changes. The Java specification (JLS 17.5) guarantees that objects with final fields are safely visible to all threads after construction. Java's String is immutable for exactly this reason: security checks depend on the string you passed not changing after the check runs.

Hashability is the constraint most engineers forget. Hash maps and sets require that keys don't change their hash value after insertion. Modify a key between insertion and lookup, and the map can't find it. This is why Python's tuple can be a dict key but list cannot. It's also why Java's String.hashCode() caches its result on the first call: since the string can never change, the cached hash is always valid.

The Naive Cost Is Only Half the Story

The obvious objection: if every modification creates a new object, then updating a 10-million-element array costs O(n) time and O(n) space per write. Prohibitive for anything resembling an algorithm.

This is true for one implementation strategy. Nobody actually uses it.

The real technique is structural sharing. When you "update" an immutable structure, the new version shares most of its memory with the old version. Only the part that changed gets copied. Everything else is referenced, not duplicated.

Structural sharing diagram: list_b = [99, 3, 4, 5] shares the tail [3, 4, 5] with list_a = [1, 2, 3, 4, 5] using a single new node list_b prepends 99 to list_a's tail. One new node. Three shared nodes. list_a has no idea and doesn't care.

This works because immutability guarantees the shared part will never change. If nodes could be modified, sharing would be dangerous. Immutability is the precondition that makes sharing safe.

How Structural Sharing Works

The immutable linked list

Prepending to an immutable singly linked list costs O(1) and creates exactly one new node. The new node's pointer simply points at the head of the existing list. The entire tail is shared. Two completely different lists can coexist pointing at the same tail nodes, and neither can affect the other.

The catch: appending to the end costs O(n). You must copy every node from head to the one before the tail, because nodes are immutable and can't have their next pointers changed. This is why functional languages favor cons lists for prepend-heavy access patterns and avoid random-access appends.

Path copying in trees

For a balanced BST with n nodes, updating a node touches O(log n) nodes along the search path. In a persistent BST, you copy exactly those nodes. Everything off the path is shared with the old version.

BST path copying: the update path (root, 4, 2) gets three new nodes; the four off-path nodes (12, 6, 10, 14) are shared Updating node 2 touches root → 4 → 2. Three new nodes. Four shared. The old version is still fully intact.

Updating one node in a tree of a million costs about 20 new allocations. The other 999,980 nodes are shared. Time and space for the update: O(log n), not O(n).

Clojure's persistent vector: the 32-way tree

Phil Bagwell's 2000 paper introduced a refinement: a tree with a branching factor of 32 instead of 2. With 32 children per node, a million-element collection has a tree depth of only 4 (since 32⁴ = 1,048,576). Updating any element copies at most 4 nodes.

Clojure's persistent vector uses this structure for all its standard collections. Get and set run in O(log₃₂ n), which for any collection size you'll encounter in practice is effectively constant. For maps and sets, the same branching structure (Hash Array Mapped Trie, or HAMT) applies. Benchmarks land within 2-3x of mutable Java equivalents single-threaded, and often faster in concurrent code where mutable alternatives need locking.

32-way trie diagram: root node with 32 children, one path highlighted showing 4 copied nodes (root, internal, internal, leaf) with millions of shared sibling subtrees At depth 4, only 4 nodes get copied per update. The other 1,048,572 nodes sit there happily sharing memory, completely unbothered.

Complexity: The Honest Comparison

OperationMutable (array/map)Naive immutable copyPersistent (structural sharing)
Array getO(1)O(1)O(log₃₂ n) ≈ O(1)
Array updateO(1)O(n)O(log₃₂ n) ≈ O(1)
List prependO(n)O(1)O(1)
List appendO(1) amortizedO(n)O(n)
BST insertO(log n)O(n)O(log n)
Hash map getO(1) avgO(1) avgO(log₃₂ n) ≈ O(1)
Hash map setO(1) avgO(n)O(log₃₂ n) ≈ O(1)

The "naive immutable copy" column is what you get if you call Python's copy.deepcopy() on every write. Nobody ships that. The persistent column is what Clojure, Scala's immutable collections, and immer (JavaScript/C++) actually deliver.

When Mutable Wins

Single-threaded, write-heavy inner loops. Sorting, graph traversal, dynamic programming table-filling. These algorithms update the same cells repeatedly. A mutable DP table updated in place is faster than the persistent equivalent. The constant factor difference (2-3x) matters at scale.

Building large structures incrementally. If you're inserting 10 million records, you don't need the intermediate versions. Build it mutable, then freeze it. Clojure's transient mechanism does exactly this: temporarily unlock the persistent structure for bulk writes (2-4x speedup), then convert back.

Cache locality. Persistent structures with pointer-based nodes are less cache-friendly than flat arrays. A mutable int[] of a million elements fits predictably in cache. A persistent vector's tree nodes are scattered across the heap. Those cache misses live inside that O(log₃₂ n) constant factor.

When Immutable Wins

Any shared state. Configuration objects, lookup tables, caches. If multiple threads read a value, immutability eliminates every synchronization concern.

Hash keys. Any value used as a dictionary key must be immutable. Python's tuple and frozenset exist solely to give you immutable versions of list and set.

Change detection. React and Redux mandate immutable state because detecting whether state changed becomes O(1): compare references, not values. oldState === newState is a single pointer comparison. Deep equality on a mutable object tree is O(n). Immer produces structurally shared immutable updates under the hood.

Anything you cache. If you memoize a function, the cached result must not be modifiable by the caller. Returning a mutable object from a cache is a bug waiting to happen. Immutable return values make caching safe by construction.

The Bug You've Already Hit

The string concatenation example at the top is the most common way immutability bites in practice. The fix is the canonical pattern:

# O(n²): immutable string copied on every iteration result = "" for word in words: result += word # O(n): mutable list accumulates parts, join once at the end parts = [] for word in words: parts.append(word) result = "".join(parts)

The intermediate parts list is mutable. You're deliberately using a mutable structure during construction and converting to an immutable string at the end. Same pattern as Clojure's transients: mutate during construction, freeze when sharing.

Side-by-side timelines: string += copies grow quadratically at each step; list append adds one element per step then joins once linearly at the end Left: each step copies everything written so far. The bars grow longer every iteration and you feel it. Right: each step adds one pointer. The final join is one linear pass.

The mental model generalizes: use a mutable builder when constructing, use an immutable value when sharing.

Git Is a Persistent Data Structure

Persistent data structures give you something mutable structures don't: cheap history. Every version of the data is preserved by default, for free, because old nodes are never overwritten. This is how Git works.

Every commit is an immutable content-addressed object. Branching creates a new root pointing at shared history. Merging combines two trees. Nothing gets deleted until garbage collection runs. The entire history is a persistent data structure with structural sharing, and a new commit creates O(depth of changes) new objects.

This is also why undo/redo in collaborative editors is cheap with immutable state. Any version is a pointer to the old root, not a full copy. Rolling back is O(1).

Mutable vs Immutable: What to Remember

  • Mutable: changes the object in place. O(1) for arrays. Faster for isolated, single-threaded writes.
  • Immutable (naive copy): O(n) per write. Don't use this.
  • Immutable (persistent, structural sharing): copies only the path to the changed node. O(log n) or O(log₃₂ n). Thread-safe, hashable, history-preserving.
  • Hash keys: only immutable types can be hash keys. Python's list can't; tuple can. Java's String can; StringBuilder can't.
  • Thread safety: immutable objects need no locks. Mutable shared state requires locks, and a wrong lock is a data race.
  • Build-then-freeze: use a mutable builder during construction, convert to immutable when sharing. ''.join(parts). tuple(list). Clojure transients.
  • React rule: immutable state makes change detection O(1). Mutating state in place makes it undetectable.

If you want to practice articulating tradeoffs like this one under pressure, SpaceComplexity runs voice-based mock interviews where follow-up questions expose whether you understand the concept or just memorized it.

Further Reading