What Is a Doubling Array? The Amortized O(1) Trick Behind Every append()

June 18, 20268 min read
dsaalgorithmsinterview-prepdata-structures
What Is a Doubling Array? The Amortized O(1) Trick Behind Every append()
TL;DR
  • Doubling arrays copy into a 2x block whenever full, so capacity grows 1→2→4→8→16…
  • Resize copies sum to at most 2n total (geometric series), giving amortized O(1) per append
  • Saying "append is O(1)" is imprecise; amortized O(1) is the technically correct phrasing for interviews
  • Any growth factor > 1 works, but larger factors (2x vs 1.5x) trade more wasted space for fewer resizes
  • Prepend and mid-array insert stay O(n) per op because the shift cost is unavoidable regardless of capacity
  • Preallocation (reserve, ArrayList(n), [None]*n) skips all resizes when final size is known upfront

You have a list. You call append() ten thousand times. Nothing crashes. The list grows, seemingly without limit. Zero segfaults. Zero exceptions. Your laptop fans don't even spin up.

That's not magic. It's a doubling array: an array that copies itself into a block twice the size whenever it runs out of room. Every major language ships one. Python calls it list. Java calls it ArrayList. C++ calls it vector. They all work the same way.

The interesting part is the cost. A single append can take O(n) time. And yet we say append is O(1). Both of those things are true. Understanding why is one of the more useful ideas in CS, and interviewers probe it more often than you'd expect. If you answer "O(1)" without the word "amortized," a good interviewer will push back. Here's why they're right to.

Fixed Arrays Have a Hard Ceiling

An ordinary array is a contiguous block of memory. You ask the OS for 10 slots, you get 10 slots. If you need an 11th, you're out of luck. The memory addresses after index 9 belong to something else. Your variable. A return address. Who knows.

So the moment you want a growable array, you have to allocate a new, larger block and copy everything over. That copy costs O(n). There's no way around it.

The question is: how often do you pay that cost?

The Naive Approach Makes You Pay Every Time

Say you grow the array by one slot each time it's full. You start with capacity 1 and push elements one by one.

  • Push 1: fine, capacity was 1.
  • Push 2: full. Allocate size 2, copy 1 element.
  • Push 3: full. Allocate size 3, copy 2 elements.
  • Push 4: full. Allocate size 4, copy 3 elements.

To push n elements, you copy 0 + 1 + 2 + ... + (n-1) elements total. That sum is n(n-1)/2. That's O(n²) work just to push n things. A 10,000-element array costs 50 million copies. A list in Python or Java that worked this way would be unusable for anything real. You'd throw your laptop into the sun before it finished.

Me after being warned 47 times not to grow arrays one slot at a time

Every CS professor watching you discover this the hard way.

The Doubling Array Brings Total Work Down to O(n)

Instead of adding one slot, double the capacity each time: 1, 2, 4, 8, 16, and so on.

Now copies happen only at each capacity boundary. Push 1,000 elements and the resizes occur at sizes 1, 2, 4, 8, 16, 32, 64, 128, 256, 512. The copy costs at each step: 1, 2, 4, 8, 16, 32, 64, 128, 256, 512.

Total copies: 1 + 2 + 4 + ... + 512 = 1023.

That's a geometric series. Its sum is always less than twice the last term: 2 × 512 = 1024. In general, for n insertions, total copy work is at most 2n.

Divide 2n total work across n insertions and you get O(1) per insertion on average. The occasional expensive resize is spread across the many cheap insertions that came before it.

This is called amortized O(1): each operation costs O(1) averaged across a sequence, even when some individual operations are expensive. The word matters in interviews. Saying "append is O(1)" is technically imprecise. "Append is amortized O(1)" is correct.

The Bank Account Proves It

Here's a clean way to see why the math works out. Give each insertion three credits: one pays for the insertion itself, and two are banked.

When the array fills at capacity C, you need to copy all C elements into the new 2C block. Since the last resize, you've added at least C/2 new elements. Each of those banked two credits. That's C credits total, exactly enough to cover all C copies.

The key invariant is that banked credits always accumulate fast enough to cover the next resize, no matter when it happens. This is a version of the potential method from formal amortized analysis. The expensive operations are never frequent enough to dominate.

Think of it like a bar tab you're splitting with your past self. Every cheap append pre-pays a tiny fraction of the next big resize. By the time the resize arrives, past-you has already covered the bill.

Why 2x? The Growth Factor Tradeoff

Doubling is not the only valid choice. Any constant growth factor greater than 1 gives amortized O(1). For a growth factor g, total copies for n insertions is bounded by n × g/(g-1), which is O(n). So 1.5x works. 1.25x works. 3x works. They all give constant amortized time.

What changes is the space-time tradeoff:

  • Larger factor (2x, 3x): fewer resizes, better throughput, more wasted capacity.
  • Smaller factor (1.25x, 1.5x): more resizes, slightly lower throughput, tighter memory usage.

Real implementations differ. GCC's std::vector and Clang's libc++ both grow by 2x, though MSVC uses 1.5x. Java's ArrayList uses 1.5x. Python's list uses a particularly conservative formula that grows by roughly 12.5% plus a small constant, because Python lists are used everywhere. Short-lived buffers, function argument lists, large result sets: minimizing wasted memory matters more than shaving off an occasional resize.

The one thing that cannot change: the growth factor must be strictly greater than 1. An additive strategy (grow by 100 each time, or any constant) gives O(n²) total work and breaks the amortized guarantee entirely. You cannot trick your way out of the math.

Half Your Memory Could Be Empty

Amortized O(1) comes with a space cost. The array always allocates more than it currently needs.

In the worst case, you just finished a doubling. Capacity is 2n, elements are n, and half the allocated memory is empty. You're paying rent on rooms nobody lives in. On average, for a 2x growth factor, expect roughly 25% of capacity wasted.

The average Python list's memory layout after a big append run

node_modules might be the only thing that wastes more space than a freshly-doubled list.

If you know the final size upfront, preallocation skips every resize and removes the wasted space entirely.

# Python: preallocate to avoid any resizes arr = [None] * 1000 for i in range(1000): arr[i] = i
// Java: hint the initial capacity to ArrayList List<Integer> list = new ArrayList<>(1000);
// C++: reserve before filling std::vector<int> v; v.reserve(1000);

When you preallocate, you pay O(n) upfront for the allocation, then O(1) per assignment with no copies. In append-heavy code where size is predictable, this is a meaningful optimization.

What Interviewers Actually Test

The question "what's the time complexity of appending to a Python list?" has a common wrong answer: "O(1)". The correct answer is "amortized O(1)". Those are different claims, and a careful interviewer will make you feel the difference.

If an interviewer asks you to trace through an appending loop and state the total cost, the answer is O(n), not O(n²). The resizes across all doublings sum to 2n, not n². That's the key insight from the dynamic array time complexity analysis.

A few related patterns come up often:

Appending n elements to an empty list. Total cost: O(n). Copies across all resizes sum to at most 2n.

Prepending n elements. Each prepend shifts all existing elements right. That's O(n) per prepend, O(n²) total. Doubling doesn't help because the shift cost is unavoidable regardless of capacity. Use a deque if you need efficient operations at both ends.

Inserting in the middle. Same problem: shifting is O(n) per insert.

Deletion from the end. O(1). No resize needed. Most implementations don't shrink on deletion automatically, so capacity stays where it is.

Amortized vs guaranteed O(1). A real-time system cannot tolerate the occasional O(n) resize pause. In latency-sensitive code, you'd use a structure with true worst-case O(1) insertion, like a linked list of fixed-size chunks, accepting the cache-locality tradeoff. The array vs linked list tradeoff post covers when that trade makes sense.

What to say about complexity. If you write up a solution using a list and state "append is O(1)", a careful interviewer will push back. The correct phrasing is "amortized O(1) per append, O(n) total across n appends." That one word signals you understand the guarantee you're actually relying on. It's a small thing. It matters.

If you want to practice explaining amortized complexity out loud the way an interview requires, SpaceComplexity runs voice-based DSA mock interviews with rubric-based feedback. It's one thing to understand the bank account proof. It's another to explain it clearly under time pressure to someone who is writing your words down.

Further Reading