What Is a Heap? The Data Structure Behind Priority Queues

June 4, 20269 min read
dsaalgorithmsinterview-prepdata-structures
What Is a Heap? The Data Structure Behind Priority Queues
TL;DR
  • Heap property: every parent ≤ children (min-heap) or ≥ children (max-heap), giving O(1) access to the extreme element at all times
  • Array storage: binary heaps live in flat arrays using index arithmetic, not pointers, which improves cache locality over BSTs
  • O(n) build: heapq.heapify() converts any unsorted list to a valid heap in linear time; always prefer it over n individual pushes
  • Top-K pattern: to find the k largest elements, maintain a min-heap of size k and evict the weakest candidate whenever something larger arrives
  • Language gotcha: Python heapq is min-only (negate values for max behavior); C++ priority_queue is max-heap by default and needs std::greater to flip
  • Heap's blind spot: no efficient search by arbitrary value (O(n) scan required); reach for a BST or hash set when lookup matters

If you've ever typed heapq.heappush in an interview and silently hoped it worked the way you thought it did, this post is for you. If you've never used a heap and are wondering why it sounds like something your garbage collector manages, also this post. (That's a different heap. Computer science naming is fine.)

A heap is a tree-based structure built on one rule: every parent is smaller than or equal to its children (min-heap) or larger than or equal to its children (max-heap). No ordering between siblings. No requirement for a sorted tree. Just the parent-child relationship, enforced at every node.

That single rule is enough to give you O(1) access to the minimum or maximum element in the entire structure. Which is why heaps show up everywhere: Dijkstra's algorithm, top-K problems, merge K sorted lists, and the sliding window maximum. Know heaps, and a whole class of "give me the best thing right now" problems becomes routine.

Min-Heap, Max-Heap, and the Rule That Runs Both

A min-heap keeps the smallest element at the root. A max-heap keeps the largest.

     Min-Heap           Max-Heap
         1                  9
       /   \              /   \
      3     5            7     5
     / \   / \          / \   / \
    4   8 7   6        3   2 4   1

In the min-heap, the root (1) is the global minimum. Every parent is smaller than its children. In the max-heap, the root (9) is the global maximum.

What the heap does not guarantee: any ordering between siblings, or any relationship between nodes at the same level. The 4 and 8 are siblings in the min-heap. That 4 < 8 is pure coincidence. The heap doesn't care, and it never will.

Under the Hood: It's Just an Array

Heaps look like trees in diagrams. In practice, they're stored as flat arrays. No nodes. No pointers. No therapy bill from chasing null references.

For a heap stored in array h with zero-based indexing:

  • Root is at index 0
  • Left child of node i is at 2*i + 1
  • Right child of node i is at 2*i + 2
  • Parent of node i is at (i - 1) // 2

The min-heap above maps to:

Index: 0  1  2  3  4  5  6
Value: 1  3  5  4  8  7  6

The tree is implicit. You navigate it with arithmetic, not pointers. This works because binary heaps are complete binary trees: every level is fully filled except possibly the last, which fills left-to-right. That completeness makes the index math exact.

Heap tree to array mapping diagram

Storing a tree in an array eliminates pointer overhead and improves cache locality. When you traverse from parent to child, you're jumping to a nearby memory address rather than chasing a pointer to somewhere arbitrary on the heap (the memory heap, which is a different thing, named by someone who apparently wanted to cause confusion for generations).

How the Operations Work

Peek: O(1)

The minimum (or maximum) is always at index 0. Read it directly. That's the whole section.

Insert: O(log n)

Append the new element to the end of the array. Then "sift up": compare with the parent, swap if the heap property is violated, repeat until you hit the root or the property holds.

def push(heap: list[int], val: int) -> None: heap.append(val) i = len(heap) - 1 while i > 0: parent = (i - 1) // 2 if heap[parent] > heap[i]: # min-heap: parent should be smaller heap[parent], heap[i] = heap[i], heap[parent] i = parent else: break

Worst case: the new element bubbles all the way from a leaf to the root. A complete binary tree with n nodes has height O(log n), so insert is O(log n).

Extract-Min: O(log n)

The minimum is at index 0. To remove it without breaking the array structure:

  1. Swap the root with the last element.
  2. Remove the last element (that's your extracted minimum).
  3. "Sift down" the new root: compare with both children, swap with the smaller child if the heap property is violated, repeat.
def pop(heap: list[int]) -> int: heap[0], heap[-1] = heap[-1], heap[0] val = heap.pop() i, n = 0, len(heap) while True: left, right = 2 * i + 1, 2 * i + 2 smallest = i if left < n and heap[left] < heap[smallest]: smallest = left if right < n and heap[right] < heap[smallest]: smallest = right if smallest == i: break heap[i], heap[smallest] = heap[smallest], heap[i] i = smallest return val

Worst case traverses the full height: O(log n).

Build-Heap: O(n) (Not O(n log n))

Given an unsorted array, you can heapify it in O(n). Not O(n log n). This trips people up.

Start from the last non-leaf node and sift down toward the leaves. Leaves never need sifting. Nodes near the bottom do minimal work. Only nodes near the root face a full O(log n) sift. The math works out to O(n) because most nodes sit near the bottom and barely move.

heapq.heapify() in Python runs in O(n). If you're building from a known list, always use it rather than pushing elements one by one.

import heapq nums = [5, 3, 8, 1, 9, 2] heapq.heapify(nums) # O(n), in-place, min-heap print(nums[0]) # 1

Complexity at a Glance

OperationTimeNotes
Peek min/maxO(1)Always at index 0
InsertO(log n)Sift up from the last leaf
Extract min/maxO(log n)Sift down from the root
Build-heapO(n)Sift-down from middle
Search by valueO(n)No faster path exists

Space is O(n) for the array.

The search row is the one people forget. Heaps don't support efficient lookup by value. If you need to check whether a specific element exists, you scan the whole array. Every element. A hash set is the right tool for that, not a heap.

Max-Heap in Python (and Why the Default Will Bite You)

Python's heapq is min-only. The standard workaround is one of those things that looks wrong the first ten times you do it: negate your values on the way in, negate again on the way out.

import heapq heap: list[int] = [] heapq.heappush(heap, -5) heapq.heappush(heap, -1) heapq.heappush(heap, -9) print(-heapq.heappop(heap)) # 9

It works because a min-heap of negative numbers behaves exactly like a max-heap of positive ones. The logic is sound. The code still looks like you're trying to hide something.

Java's PriorityQueue is also min-heap by default. Pass Collections.reverseOrder() for max behavior. C++ goes the other way: std::priority_queue is a max-heap by default, and you need std::greater<int> to get a min-heap. Know the default orientation for your language before you reach for the priority queue in an interview. Getting this wrong burns time, and then explaining why you're negating everything burns more time.

The Four Patterns That Keep Showing Up

Top-K Elements

Find the k largest elements from n items.

Naive: sort everything, take the last k. That's O(n log n).

With a heap: maintain a min-heap of size k. For each new element, if it's larger than the root (the current weakest top-k candidate), pop the root and push the new element. After processing everything, the heap contains exactly the k largest.

To find the k largest, use a min-heap of size k. You're evicting the weakest candidate whenever something better shows up. The counterintuitive direction is the whole point.

import heapq def top_k_largest(nums: list[int], k: int) -> list[int]: heap: list[int] = [] for num in nums: heapq.heappush(heap, num) if len(heap) > k: heapq.heappop(heap) return heap

Time: O(n log k). Space: O(k). Both beat sorting when k is much smaller than n.

K-th Largest Element

Same idea, simpler result. Maintain a min-heap of size k. After processing all elements, the root is your answer. LeetCode 215. Worth implementing manually at least once before you rely on heapq.nlargest.

Merge K Sorted Lists

You have k sorted lists. Push the first element of each into a min-heap along with which list it came from. Repeatedly extract the minimum, output it, then push the next element from that same list. The heap keeps the current frontier sorted at O(log k) cost per step.

Time: O(n log k) where n is total elements across all lists. This is LeetCode 23, and it shows up in streaming and external merge sort problems too.

Median of a Data Stream

Maintain two heaps: a max-heap for the lower half of values, a min-heap for the upper half. Keep them balanced (sizes differ by at most one). The median is either the root of the larger heap or the average of both roots.

Insert into the appropriate heap, then rebalance if sizes diverge. Both operations are O(log n). Reading the median is O(1). This is LeetCode 295, and it's a regular at Google and Meta. The two-heap trick is the kind of insight that looks obvious in retrospect and takes a while to arrive at the first time.

Where the Heap Falls Apart

The heap is purpose-built for one thing: give me the best element right now. It does that well. Ask it for anything else and it starts to crack:

  • You can't efficiently search for an arbitrary value. Want to check if 42 is in the heap? Scan the whole array like a dog looking for a treat you already ate.
  • You can't traverse elements in sorted order without destroying the heap. Each extract costs O(log n), so pulling everything out is O(n log n), at which point you could have just sorted.
  • You can't efficiently update an arbitrary element's priority without a direct reference to it.

For ordered access or search-by-value, a BST is a better fit. For repeated minimum-extraction, nothing beats the heap.

Where to Go From Here

The full heap deep dive covers the sift-down proof, every language implementation, and the difference between binary heaps and Fibonacci heaps. For interview-specific practice, the top heap interview problems list covers every pattern you're likely to encounter.

If you want to practice top-K, sliding window maximum, or median-of-stream problems in a realistic setting, SpaceComplexity runs voice-based DSA mock interviews with rubric feedback. Heap problems are easy to code and surprisingly easy to explain poorly. The rubric catches both.

Further Reading