Heap vs Sorted Array: O(n) Insertion Is the Price of Full Order

May 28, 202610 min read
dsaalgorithmsinterview-prepdata-structures
Heap vs Sorted Array: O(n) Insertion Is the Price of Full Order
TL;DR
  • Heap insertion is O(log n). Sorted array insertion is O(n) because shifting all subsequent elements dominates.
  • Floyd's build-heap runs in O(n). Building a sorted array always costs O(n log n). There is no shortcut.
  • Binary search fails on a heap. The heap array has no monotone property across arbitrary positions, so the search has nothing to bisect.
  • Use a heap when insertions and extractions interleave: streaming, Dijkstra, top-k from an unknown-length stream.
  • Use a sorted array when the collection is static and you need range queries, rank access, or repeated binary search.
  • Use a balanced BST (TreeMap, std::set, SortedList) when you need O(log n) insertion and O(log n) search at the same time.
  • The heap's partial order is a feature: you only pay to maintain what you actually use.

Both data structures give you the minimum in O(1). Both live in arrays. The heap vs sorted array choice hinges on one thing: how often your data changes. One charges O(log n) to insert, the other O(n). The difference is not about speed tricks or implementation quirks. It is about how much ordering you actually keep, and why keeping less of it is often exactly right. If you pick wrong, you will find out at n = 10,000 items and not a moment before.

A frog in formal attire announcing the implementation of a sorting algorithm in production The energy before you realize your priority queue has been doing O(n) insertions all along.

A Heap Is Ordered. Not in the Way You Think.

A sorted array knows the complete ranking of every element. Index 0 is the minimum. Index 1 is the second-minimum. Binary search works because every element sits in its correct position relative to every other element. You can answer "what is the k-th smallest element?" in O(1) by reading arr[k-1].

A min-heap knows only one thing: every parent is smaller than or equal to its children. That is the entire invariant. The root is the minimum. Everything else is up for grabs. The left subtree might contain elements larger than some elements in the right subtree. Two siblings can be in any order relative to each other. You cannot binary-search a heap. There is no monotonic property across arbitrary positions.

Min-heap tree with backing array showing sibling nodes 3 and 2 in no enforced order, alongside a sorted array with binary search midpoint and O(1) rank access Heap: root is minimum, siblings are unrelated. Sorted array: every element knows its rank.

The partial order is a feature, not a limitation. You only pay to maintain what you actually use.

How Insertion Works (and Why the Sorted Array Is Having a Bad Time)

Heap insertion has two steps: append the new element to the end of the backing array, then bubble up. Swap with the parent, repeatedly, until the heap invariant holds. The complete binary tree has height log n, so the worst case is log n swaps.

import heapq h = [] heapq.heappush(h, 5) # [5] heapq.heappush(h, 1) # [1, 5] - 1 bubbles past 5 to the root heapq.heappush(h, 3) # [1, 5, 3] - 3 already satisfies invariant, no swap heapq.heappush(h, 2) # [1, 2, 3, 5] - 2 bubbles past 5 print(heapq.heappop(h)) # 1 - swap root with last element, then sift down

Sorted array insertion also has two steps: find the right position via binary search (O(log n)), then shift every subsequent element right by one to make room (O(n)). The search is cheap. The shift kills you.

import bisect arr = [1, 2, 3, 5] pos = bisect.bisect_left(arr, 4) # pos = 3, O(log n) arr.insert(pos, 4) # shifts [5] one slot right, O(n) # arr is now [1, 2, 3, 4, 5]

Python's bisect.insort looks deceptively fast. It is not. The binary search runs in O(log n). But the underlying list shift is O(n), and that dominates. For n insertions from scratch, the total is 0 + 1 + 2 + ... + (n-1) = O(n²). Which means your sorted-array priority queue just cost O(n²). The code review will be interesting. Even sorting all n elements in one batch costs O(n log n), which is better.

Building a heap, by contrast, costs O(n) via Floyd's bottom-up algorithm. Floyd's starts at the last non-leaf and calls sift-down on each node working toward the root. Most nodes live near the leaves and sift down very little. The total work is the sum, over each level h, of (n / 2^h) nodes each doing h swaps. That series converges to 2n. Build heap is O(n), build sorted array is O(n log n), and there is no way to close that gap.

Sorted array insertion showing the O(n) element shift, contrasted with heap insertion showing O(1) append followed by O(log n) bubble-up Sorted array: binary search finds the slot in O(log n), then O(n) shifts everything after it. Heap: append takes O(1), bubble up takes at most O(log n).

Heap vs Sorted Array: Performance at a Glance

OperationMin-HeapSorted Array
Peek minimumO(1)O(1)
InsertO(log n)O(n)
Extract minimumO(log n)O(1) seq. or O(n) arbitrary
Search for valueO(n)O(log n)
k-th smallestO(k log n)O(1)
Build from n itemsO(n)O(n log n)

Three rows dominate the decision: insert, search, and k-th smallest. Insert favors the heap by a full O(n) factor. Search and rank access favor the sorted array by a factor of n. Everything else is context.

The Heap Wins Every Time You Insert Between Extractions

The heap's advantage materializes whenever insertions and extractions interleave in an ongoing stream. Each extraction generates new work that requires new insertions. If you cannot collect all input upfront and sort it once, you need a structure that accepts new elements cheaply.

Dijkstra's algorithm is the textbook case. Pull the shortest-known-distance vertex from the frontier (extract-min), relax its edges, push neighbors back (insert). These alternate on every iteration. With a binary heap, each costs O(log V). Total: O((V + E) log V). With a sorted array, each insert shifts O(V) elements, so relaxing E edges costs O(EV). For sparse graphs where E = O(V), that is O(V²) versus the heap's O(V log V). The heap wins decisively. The full Dijkstra analysis shows how this plays out in practice.

The same pattern appears in top-k from a stream. You do not know when the stream ends, so you cannot sort first. A min-heap of size k works: push each incoming element, then pop the minimum if the heap exceeds k elements. After n elements, the heap holds the k largest values seen, with the k-th largest at the root. Each element costs O(log k), total O(n log k). A sorted array approach using bisect.insort costs O(k) per insert due to shifting, giving O(nk) total. For large k that is significantly worse.

Any problem that reads "process the item with the smallest priority next, and new items arrive as you go" is a heap problem.

The Sorted Array Wins When the Data Stops Changing

The sorted array earns its keep when the collection is static or nearly static and you need to query it in multiple ways.

Binary search only works because every element sits in its correct ranked position. This gives you O(log n) for range queries ("how many elements between 10 and 50?"), predecessor queries ("what is the largest value smaller than x?"), and nearest-neighbor queries ("what value is closest to target?"). The heap makes you scan every element for each of these queries, O(n) each.

Access by arbitrary rank is the most dramatic difference. arr[k-1] on a sorted array is O(1), regardless of k. On a heap, extracting the k-th smallest requires k separate extract-min operations at O(log n) each: O(k log n) total. To find the median of a heap, you pay O(n log n). The sorted array answers in a single array read.

The sorted array pattern is: sort once, query indefinitely. The O(n log n) upfront cost amortizes across every subsequent lookup. If you are searching the same set of IP addresses, timestamps, or product IDs thousands of times, build a sorted array first.

Two timelines showing dynamic collection with heap using O(log n) push and pop operations, versus static collection with one O(n log n) sort followed by many O(log n) binary searches Dynamic collection: every push and pop costs O(log n). Static collection: pay O(n log n) once, then every subsequent lookup is O(log n).

You Cannot Binary-Search a Heap (Please Stop Trying)

This is the mistake that comes up most often, usually right after someone reads that heaps are "ordered" and gets ambitious. People know a heap is "ordered" and try to apply binary search. It does not work.

Binary search requires the array to be monotonically non-decreasing from left to right. You need to know that everything at indices less than mid is smaller than everything at indices greater than mid. A heap's array layout gives you no such guarantee. The element at index 5 might be smaller than the element at index 2, depending on the shape of the tree, even though 5 > 2 as array indices.

Heap array [1,3,2,7,4,5,6] showing a binary search midpoint at index 3 with value 7, but index 4 has value 4 which is less than 7, breaking the monotone property binary search requires Binary search picks mid = index 3, value 7. But index 4 holds value 4, which is less. The monotone property is broken. The search returns garbage.

If you find yourself wanting to search a heap for a specific value, you have two honest options: scan the entire array in O(n), or switch to a different structure. If you need O(log n) insertion and O(log n) search simultaneously, you want a balanced BST. Java's TreeMap, C++'s std::set, and Python's sortedcontainers.SortedList give every operation in O(log n) at a larger constant factor than either a heap or a raw sorted array. The heap is not a lazy sorted array. It is a different invariant for a different problem.

The heap data structure deep dive covers the sift-up and sift-down mechanics in detail, including why the array index arithmetic parent = (i-1)//2 and left_child = 2*i+1 fall out of the complete binary tree structure.

One Question Settles This

You have a collection. Elements arrive and depart. You always need the smallest one next.

Do new elements arrive between extractions?

Yes: use a heap. The partial order is exactly what you need. You do not need the full order you would pay O(n) per insert to maintain.

No, data is static: sort once, then use binary search. Pay O(n log n) upfront and O(log n) per query indefinitely. The sorted array handles every subsequent lookup cheaply.

You need O(log n) insertion and O(log n) search simultaneously: use a balanced BST. Java's TreeMap, C++'s std::set, Python's sortedcontainers.SortedList. All operations stay O(log n), at the cost of a larger constant factor compared to a raw heap or array.

The heap is not the "fast version" of a sorted array. It answers a different question, under a weaker invariant, for the exact cost you actually need to pay.

Quick Reference

  • Heap invariant: every parent is smaller than or equal to its children. The root is the minimum. Siblings have no enforced relationship.
  • Sorted array: every element sits in its correct rank position. Binary search works. Random rank access is O(1).
  • Heap insert is O(log n). Sorted array insert is O(n) due to the shift.
  • Heap build is O(n) via Floyd's algorithm. Sorted array build is O(n log n), no shortcut exists.
  • Heap extract-min is O(log n). Sorted array extract-min is O(1) if sequential, O(n) if arbitrary.
  • You cannot binary-search a heap.
  • Heap wins for dynamic collections, priority queues, streaming problems, Dijkstra.
  • Sorted array wins for static data, range queries, rank access, binary search.
  • When you need both cheap insertion and search: balanced BST.

Saying "I'll use a heap because I need O(log n) insertion and I do not need binary search" is the kind of reasoning interviewers listen for. SpaceComplexity runs voice-based DSA mock interviews with rubric-based feedback so you can practice explaining tradeoffs like this out loud until it comes out cleanly under pressure.

Further Reading