What Is Heapify? The Algorithm Every Heap Interview Tests

June 4, 20269 min read
dsaalgorithmsinterview-prepdata-structures
What Is Heapify? The Algorithm Every Heap Interview Tests
TL;DR
  • Heapify has two meanings: sift-down (fix one node in O(log n)) and build-heap (convert any array to a valid heap in O(n))
  • Build-heap is O(n), not O(n log n), because most nodes sit near the bottom of the tree and travel very few levels during sift-down
  • Sift-down is used in build-heap (not sift-up) specifically to achieve the linear bound; building with sift-up costs O(n log n)
  • Python's heapq.heapify() runs the O(n) build-heap procedure in-place, not n separate insertions
  • Both sift-down and sift-up are O(1) extra space when implemented iteratively, making heap sort uniquely space-efficient among O(n log n) sorts

You've called heapq.heapify() on a list. It worked. You moved on. Solid engineering.

Then an interviewer asks "so, why is heapify O(n)?" and you realize you've been using a priority queue the way you use a microwave: push a button, trust the magic, never open it up to see what's actually happening in there.

The heapify algorithm is not complicated. The time complexity is counterintuitive in exactly the way interviewers love to test. And "can you build a heap faster than O(n log n)?" is a direct question, with a yes answer, and most candidates have no idea.

The Heap Property, in One Sentence

A heap is a complete binary tree where every parent satisfies a fixed ordering relative to its children. Max-heap: every parent is greater than or equal to both children. Min-heap: the reverse.

The heap property says nothing about ordering between siblings or between non-adjacent nodes. That's intentional. It's weaker than full sorted order, which is exactly what makes insertion and deletion fast.

In practice, heaps live in arrays. Root at index 0. For a node at index i, left child is at 2i + 1, right child at 2i + 2, parent at (i - 1) // 2. No pointers, no allocations, just math.

What "Heapify" Actually Means

The word gets used two ways and conflating them is where interviews go sideways.

Sift-down (heapify-down): fix a single violating node by pushing it downward until the property is restored. Costs O(log n).

Build-heap: convert an arbitrary unsorted array into a valid heap by calling sift-down on every non-leaf node. Costs O(n). Not O(n log n). O(n).

Python's heapq.heapify() does the second one. When you call it on a list, you are not inserting n elements one by one. You are running the O(n) build-heap procedure. The naive "insert one at a time" approach looks nearly identical in code and costs O(n log n) instead. Know the difference.

Anatomy of a developer's browser: the path from "Stack Overflow" to "reading the documentation/source"

Every developer using heapq.heapify() vs the one who opened the CPython source at 11pm.

Sift-Down: A Concrete Walk-Through

Start with this array:

[1, 8, 3, 7, 6, 2, 4]

Drawn as a tree:

        1
       / \
      8   3
     / \ / \
    7  6 2  4

This violates the max-heap property at the root. Node 1 is smaller than both its children and has no business being on top. Run sift-down:

  1. Compare node 1 with left child 8 and right child 3. Largest is 8.
  2. Swap 1 and 8. Node 1 moves to index 1.
  3. Node 1's new children are 7 and 6. Largest is 7.
  4. Swap 1 and 7. Node 1 is now a leaf. Done.

Result:

        8
       / \
      7   3
     / \ / \
    1  6 2  4

Sift-down terminates in at most O(log n) steps because the tree has height log n. Each step moves the violating node one level down, and there are only log n levels to fall through.

Here is the iterative implementation:

def sift_down(arr: list[int], n: int, i: int) -> None: while True: largest = i left = 2 * i + 1 right = 2 * i + 2 if left < n and arr[left] > arr[largest]: largest = left if right < n and arr[right] > arr[largest]: largest = right if largest == i: break arr[i], arr[largest] = arr[largest], arr[i] i = largest

Iterative beats recursive here. Recursive sift-down works fine but burns O(log n) stack frames. The iterative version is O(1) space. The choice becomes relevant when your interviewer asks the space complexity follow-up, which they will.

Build-Heap: The O(n) Algorithm

Naive approach: insert n elements one by one using sift-up. Each insertion costs O(log n), total is O(n log n). Simple to implement, definitely worse.

Better approach: call sift-down on every non-leaf node, starting from the last non-leaf and working up to the root.

Leaves already satisfy the heap property by definition. No children means no property to violate. The last non-leaf is at index n // 2 - 1. Go from there to index 0.

def build_heap(arr: list[int]) -> None: n = len(arr) for i in range(n // 2 - 1, -1, -1): sift_down(arr, n, i)

Tracing through the array from before (n=7, last non-leaf = index 2):

Pass 1 (i=2, node=3): children are 2 and 4. Max is 4. Swap 3 and 4.

[1, 8, 4, 7, 6, 2, 3]

Pass 2 (i=1, node=8): children are 7 and 6. Node 8 is already largest. No swap.

[1, 8, 4, 7, 6, 2, 3]

Pass 3 (i=0, node=1): children are 8 and 4. Swap 1 and 8. Then 1's new children are 7 and 6. Swap 1 and 7.

[8, 7, 4, 1, 6, 2, 3]

Valid max-heap. Three passes. Not n passes. Three.

Why Build-Heap Is O(n), Not O(n log n)

Here is the argument most people make: we call sift-down on n/2 nodes, each costs O(log n), so total is O(n log n). Feels right. It's wrong.

Most nodes sit near the bottom of the tree and sift down very few levels. The cost of each sift-down call depends on the height of that node, not the height of the tree.

At height h from the bottom, there are at most ⌈n / 2^(h+1)⌉ nodes, and each costs O(h) to sift down. Total work:

T(n) = sum over h from 0 to floor(log n):
         ceil(n / 2^(h+1)) * O(h)
       = O(n * sum(h / 2^h))

The series ∑(h / 2^h) from h=0 to infinity converges to 2. So T(n) = O(2n) = O(n).

The intuition that actually sticks: half the nodes are leaves. Height 0. Zero work. A quarter are at height 1. One swap each. An eighth at height 2. Two swaps each. The tree front-loads its nodes at the bottom, and the exponentially shrinking count at each level outpaces the linearly growing cost per node. The math collapses to a constant.

Homer Simpson: "O(1) time complexity" looking slim vs "O(n^2) space complexity" in a fat suit

The recursive sift-down hides an O(log n) stack surcharge. The iterative version just looks like that in the first frame.

Sift-Up: The Other Direction

Sift-up is what runs when you push a new element into an existing heap. New element lands at the end of the array and bubbles up until the heap property is restored.

def sift_up(arr: list[int], i: int) -> None: while i > 0: parent = (i - 1) // 2 if arr[parent] < arr[i]: # for max-heap arr[parent], arr[i] = arr[i], arr[parent] i = parent else: break

Sift-up is O(log n). It is the operation behind heapq.heappush().

Build-heap uses sift-down specifically because it achieves O(n) total, while building with sift-up would cost O(n log n). If you already have all the data and need a heap, use build-heap. If you are inserting elements one at a time into a live heap, sift-up is your only option. Know which situation you are in.

What About Space?

Both operations are O(1) extra space when implemented iteratively. The heap is rearranged in-place. Recursive sift-down uses O(log n) stack space (one frame per level). Usually fine. Just know the number when someone asks.

How This Shows Up in Interviews

Three patterns come up constantly.

Top-K elements. The standard approach maintains a heap of size k as you scan the array, costing O(n log k). If k is close to n, heapq.heapify() in O(n) followed by k extractions is often faster to write and easier to reason about. Know when the full-array heapify beats the rolling heap. The Top-K heap pattern covers this in detail.

Heap sort. Build the heap in O(n), then extract the max n times for O(n log n) total. The interview point: heap sort is also O(1) extra space, unlike merge sort. Interviewers sometimes ask why heap sort is not the default sort despite that theoretical edge. Cache locality is the answer (see Heap Sort).

"Can you build a heap faster than O(n log n)?" Yes. O(n). Being able to explain why (height distribution, converging series) puts you ahead of candidates who only know that heaps exist. This is not an obscure question. It shows up.

Working through the O(n) proof out loud under time pressure is harder than reading it silently. SpaceComplexity runs voice-based DSA mock interviews with rubric feedback, so you can practice explaining build-heap the way you'd explain it to an actual interviewer, not just read it off a blog.

A Concrete Python Example

Python's heapq is a min-heap. To use it:

import heapq data = [1, 8, 3, 7, 6, 2, 4] heapq.heapify(data) # O(n), in-place print(data) # [1, 6, 2, 7, 8, 3, 4] -- valid min-heap smallest = heapq.heappop(data) # O(log n), returns 1 heapq.heappush(data, 5) # O(log n), inserts 5

For a max-heap, negate all values before inserting and negate again on extraction. It is mildly annoying. Everyone does it anyway.

The internal structure of data after heapify may not look sorted, and it does not need to. It only needs to satisfy the min-heap property at every index. data[0] is always the minimum. That is all the guarantee you get, and for priority queue use cases, it is all you need.

For a deeper look at how the heap data structure works under the hood, see the heap data structure guide. For practice problems, the top heap interview problems list covers every pattern you will encounter.

The Five Things to Know Cold

  • Sift-down fixes one violating node by pushing it downward: O(log n)
  • Build-heap calls sift-down on every non-leaf from bottom to root: O(n), not O(n log n)
  • The O(n) bound holds because most nodes sit near the bottom and sift down very few levels
  • Build-heap uses sift-down (not sift-up) specifically to achieve O(n) instead of O(n log n)
  • Both operations are O(1) extra space with iterative implementations

Further Reading