Amortized Analysis: When One Expensive Operation Pays for Itself

Python's list.append() is documented as O(1). Dig into the CPython source and you'll find it occasionally runs an O(n) memory copy. Both claims are correct. You are not being gaslit.

The core idea: consider the total cost of a long sequence of operations, not just the worst case of a single one. If n operations together cost O(n), each one averages O(1), and that average holds against any sequence of operations you can construct, not just lucky ones.

The reconciliation is amortized analysis, and once you understand it, you'll read every "O(1) insert" claim with a knowing look instead of blind trust.

The O(1) guarantee, unmasked. r/ProgrammerHumor

The O(1) claim, unmasked. Both panels are correct.

The Cost Sequence You're Not Looking At

Start with an empty dynamic array (capacity 1) and append 16 elements. Track the cost of each append, counting 1 for each element written, whether into the slot (insert) or moved to the new buffer (copy during resize).

Append:   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16
Cost:     1   2   3   1   5   1   1   1   9   1   1   1   1   1   1   1

The spikes at positions 2, 3, 5, and 9 are resize events. Each time the array was full, it doubled: capacity 1 → 2 → 4 → 8 → 16. Copying the old elements costs O(old capacity) each time.

Total across 16 appends: 1+2+3+1+5+1+1+1+9+7 = 31. Less than 2 per append on average.

The spikes grow, but they happen less and less often. By the time you pay 9 for a resize, you've had 8 cheap appends since the previous one. The expensive operation is self-limiting. A resize is only possible after a full half of the array filled up with cheap operations. (These are also the operations that show up in your profiler right before your manager asks why the dashboard is slow.)

Bar chart of 16 consecutive append operations on a dynamic array. Blue bars show cheap O(1) inserts; amber bars show resize events at appends 2, 3, 5, and 9. The average cost (dashed line) stays well below 2 despite the spikes.

Cost per append: cheap operations dominate, expensive ones self-regulate.

What "Amortized O(1)" Actually Guarantees

This is the part that trips people up. Amortized analysis is not average-case analysis. No probabilistic assumptions. No distribution over inputs.

The guarantee: for any sequence of n operations on this data structure, the total cost is at most n × (amortized cost). A worst-case bound on the sequence, not per operation.

Compare the three complexity flavors:

Worst-case per operation: each individual call costs at most X. Tight, but often too pessimistic when expensive and cheap operations can't both happen at once.
Average-case: over a random input distribution, the expected cost is X. Requires an input model you may not trust.
Amortized: any sequence of n operations costs at most n·X total. No assumptions. A genuine guarantee.

Expensive operations are possible only after a long run of cheap ones. The expensive operation resets the internal state to cheap mode, so the sequence self-regulates. Robert Tarjan formalized this in 1985 in "Amortized Computational Complexity," SIAM Journal on Algebraic and Discrete Methods. It remains the canonical reference.

Method One: Just Add It Up

The aggregate method is the most direct. Compute the total cost T(n) for n operations, then divide: amortized cost = T(n)/n.

The binary counter is the classic warm-up. A k-bit counter starts at 0 and counts up. Each increment flips bits from the bottom. In the worst case, one increment flips all k bits (transitioning from 0111...1 to 1000...0). Naive bound: n increments flip at most n·k bits, so O(n log n) total. Too pessimistic, and here's why.

def increment(bits):
    i = 0
    while i < len(bits) and bits[i] == 1:
        bits[i] = 0
        i += 1
    if i < len(bits):
        bits[i] = 1

Track how often each position flips across n increments:

Bit position j	Flips every ...	Total flips in n increments
0	1 increment	n
1	2 increments	n/2
2	4 increments	n/4
j	2ʲ increments	n/2ʲ

Bit 0 has never taken a day off. It flips every single increment, forever. Bit 7 barely wakes up. Track how often each position flips and the geometric series does the work for you.

Total flips = n·(1 + 1/2 + 1/4 + ...) ≤ n · Σ(1/2ʲ) = n · 2 = 2n.

So n increments flip at most 2n bits. Amortized cost per increment = 2 = O(1). The O(n log n) bound was off by a log factor because it assumed every bit could flip on every increment. It can't.

Horizontal bar chart showing flip frequency per bit position. Bit 0 flips n times (full bar), bit 1 flips n/2 times (half bar), and so on geometrically. The total across all bits sums to 2n.

Geometric decrease in flip frequency. Summing the bars gives 2n, not n log n.

Method Two: Pay It Forward

The accounting method (also called the banker's method) assigns an amortized cost to each operation type. When amortized > actual cost, store the surplus as credit on specific objects in the data structure. When actual > amortized, spend stored credit. The constraint: credit must never go negative, anywhere.

Data structures, unlike people, always honor their debts.

For the dynamic array, claim an amortized cost of 3 per append. Here is where the 3 credits go on each append:

1 credit: pays for writing the new element right now.
1 credit: stored on the new element itself, earmarked to pay for copying it during a future resize.
1 credit: stored on an older element in the first half of the array, earmarked to pay for copying it during a future resize.

When the array fills to n elements and resizes to 2n, we must copy n elements. At that moment, n/2 elements were appended since the last resize, each responsible for 2 stored credits: 1 on itself, 1 deposited on an older element. That is n credits total, exactly enough to pay for all n copies.

Credit never goes negative, so 3 is a valid amortized cost. Total cost for m appends: at most 3m.

Array of 8 elements at full capacity. Elements 1 through 4 (old) each hold one stored credit deposited by the new elements. Elements 5 through 8 (new) each hold one stored credit on themselves. Total is 8 credits, exactly the cost of copying all 8 elements at the next resize.

8 stored credits, 8 elements to copy. The bank balance hits zero at the resize and immediately starts charging up again.

Method Three: The Physics Metaphor That Actually Works

The potential method is the most powerful. It assigns a "potential" Φ to the data structure as a whole, representing stored work. The amortized cost of each operation is:

amortized_cost = actual_cost + Φ(after) − Φ(before)

Sum this across all operations and the Φ terms telescope:

Σ amortized_cost = Σ actual_cost + Φ(final) − Φ(initial)

If Φ never drops below Φ(initial), then total amortized cost is an upper bound on total actual cost.

For the dynamic array, define Φ(D) = 2 · size − capacity.

Right after any resize (array doubled from n to 2n slots, n+1 elements now stored), Φ = 2(n+1) − 2n = 2. The potential drops back near zero after each resize. Between resizes, each plain insert adds 2 to Φ. By the time the array fills again, Φ has climbed up to exactly the number of elements that will need to be copied. The potential charges up steadily during the cheap phase and discharges entirely during the expensive one.

Case 1: Insert without resize (size n, capacity m, n < m)

	Value
Actual cost	1
Φ before	2n − m
Φ after	2(n+1) − m
ΔΦ	+2
Amortized cost	1 + 2 = 3

Case 2: Insert with resize (array full: size n = capacity n)

	Value
Actual cost	n + 1 (copy n, insert 1)
Φ before	2n − n = n
Φ after	2(n+1) − 2n = 2
ΔΦ	2 − n
Amortized cost	(n+1) + (2−n) = 3

Both cases give exactly 3. The potential absorbed the spike, spreading its cost backwards onto every cheap append that built up Φ in the first place.

Non-negativity holds throughout: right after any resize Φ = 2 > 0, and between resizes Φ only increases. So Φ(final) ≥ Φ(initial) = 0, and the analysis is valid for any sequence of appends.

Line graph of the potential function Φ over a sequence of append operations. Between resizes, Φ climbs linearly at slope 2. At each resize event (amber dashed verticals), Φ drops sharply back to 2. Three sawtooth teeth are shown, each taller than the last, mirroring the growing resize costs.

The potential charges up during cheap appends and fully discharges at each resize. Same story, every cycle.

Why Doubling Is Non-Negotiable

Suppose instead of doubling, the array grows by a fixed constant k on each resize. After n appends, there are roughly n/k resizes, with copy costs k, 2k, 3k, ..., up to n.

Total copy cost: k + 2k + 3k + ... + n ≈ n²/(2k). That is Θ(n²) total, or Θ(n) per append.

Linear growth is the "save for retirement by putting $5 a month in a mattress" strategy. Technically forward motion. Not a plan.

What makes doubling work is the geometric series:

1 + 2 + 4 + 8 + ... + n/2 = n − 1 < n

A geometric series converges (in total cost terms). An arithmetic series does not. Any growth factor greater than 1 gives O(1) amortized append, because the series 1 + r + r² + ... = 1/(1−r) converges for any r < 1. Java's ArrayList uses 1.5×. CPython's list uses roughly 1.125× after the first few elements (the exact formula is new_size + (new_size >> 3) + 6 for larger lists). Both give O(1) amortized. Linear growth is the one strategy that breaks it.

Your Hash Map Has Been Doing This the Whole Time

When a hash map's load factor crosses a threshold, it rehashes: allocates a new, larger bucket array and reinserts every element. Same structure as dynamic array resize. Same amortized argument.

With geometric resizing, the total rehash cost across n insertions is 1 + 2 + 4 + ... < 2n. Each insertion is O(1) amortized. Java's HashMap triggers at load factor 0.75, Python's dict at roughly 2/3. The threshold shifts constants, not asymptotic behavior. Resize twice as often and you roughly halve the constant, but the O(1) amortized guarantee remains.

Python devs: your dict has been quietly rehashing your whole career. Java devs: same. Neither told you. You've been fine.

The potential function for hash maps is slightly more involved (it needs to handle both expansion on insert and contraction on delete), but the mechanics are identical: potential climbs during cheap operations, drops during rehash. The load-factor-vs-time plot is just another sawtooth. Same teeth, different labels.

What Amortized Analysis Does Not Give You

Amortized cost does not compose naively with worst-case analysis. If operation A is O(1) amortized and you call it n times in a loop, the loop is O(n) amortized total. But if your outer algorithm also modifies the data structure in ways that interact with its potential, you need to account for the full operation sequence. You cannot mix amortized bounds from different analyses without care.

More important: amortized O(1) is not O(1) worst-case per operation. If you need a hard latency bound (game engines updating at 60 fps, embedded controllers, real-time DSP), amortized is insufficient. A resize that takes O(n) will occasionally hit. That 300ms UI freeze your users filed a bug report about? Could be exactly this. That is why specialized structures exist with genuine O(1) worst-case: circular buffers, real-time deques, and gap buffers. Amortized covers total throughput. It says nothing about any individual spike.

Three Methods, One Guarantee

Amortized analysis bounds the total cost of a sequence of n operations. It is a worst-case guarantee on the sequence, not per operation, and makes no probabilistic assumptions.
Aggregate method: compute total cost T(n), divide by n. Best when operations are symmetric. Binary counter: total flips ≤ 2n, amortized cost = 2.
Accounting method: assign amortized cost per operation, store excess as credit on data structure objects, spend credit on expensive operations. Credit must never go negative.
Potential method: define Φ on the data structure state. Amortized cost = actual + ΔΦ. Φ terms telescope; if Φ stays non-negative, amortized total is a valid upper bound.
Dynamic array append is O(1) amortized because doubling makes total copy cost a geometric series bounded by 2n. Any constant growth factor > 1 works.
Hash map rehash follows the same analysis. Load factor threshold controls the constant, not the asymptote.
Amortized O(1) ≠ O(1) worst-case. Real-time systems need both.

If an interviewer asks "what is the amortized complexity of push?" and then immediately asks "what is the worst-case cost of a single push?", those are two different questions with two different answers. Explaining both clearly and quickly, under pressure, in a voice interview is a distinct skill from just knowing the math. SpaceComplexity puts you in that exact situation: voice-based mock interviews with rubric-based feedback that catches compressed or confused reasoning before you take it into a real loop.