Merge Sort vs Heap Sort: Same Complexity, Opposite Trade-offs

Merge sort and heap sort both promise Θ(n log n) in every case. No best-case shortcut, no worst-case blowup. On paper they're equivalent. In practice, merge sort is two to four times faster on large inputs. Both earn that bound. The gap lives in the constant factor hiding inside it, and in the one property heap sort can never recover: stability.

The decision reduces to two questions. Do you need a stable sort? Do you have a hard memory constraint? If the first answer is yes, you're done. If the second answer is yes and the first is no, heap sort earns its place. Every other situation is merge sort's to lose.

The One Table That Ends the Argument

Merge sort is O(n) space, stable, and cache-friendly. Heap sort is O(1) space, not stable, and cache-hostile.

That's it. If you need to sort a linked list, perform an external sort across disk, or maintain the relative order of equal elements, merge sort wins. If you're on an embedded system with 128 bytes of free memory and no stack budget, heap sort wins. For everything else, general-purpose in-memory sorting, you'll use introsort: quicksort with heap sort bolted on as a safety net.

(GCC and Clang don't use heap sort as their primary sort. They use it when quicksort recurses so deep they worry about the stack. That's the level of trust heap sort has earned from the compiler writers who know it best.)

Merge Sort: Split, Recurse, Merge

The algorithm is clean. Split the array in half, recursively sort each half, then merge the two sorted halves in one linear pass. Two pointers walk the halves from left to right, always appending the smaller element to the output. When ties happen, the left element goes first. That single rule (take left on tie) is what makes merge sort stable.

def merge_sort(arr):
    if len(arr) <= 1:
        return arr
    mid = len(arr) // 2
    left = merge_sort(arr[:mid])
    right = merge_sort(arr[mid:])
    return merge(left, right)

def merge(left, right):
    result, i, j = [], 0, 0
    while i < len(left) and j < len(right):
        if left[i] <= right[j]:   # <= here is what buys stability
            result.append(left[i]); i += 1
        else:
            result.append(right[j]); j += 1
    result.extend(left[i:])
    result.extend(right[j:])
    return result

Merge sort recursion tree showing log₂ n levels with n total work at each level for input [5,2,8,1,9,3] Each level does exactly n total work across all merge calls. log₂ n levels times n work = Θ(n log n). No tricks required.

The tree has log₂ n levels. Every level does exactly n total work across all its merge calls. One n per level, log₂ n levels: T(n) = Θ(n log n).

You can confirm with the Master Theorem. The recurrence is T(n) = 2T(n/2) + Θ(n). Pull out the constants: a = 2, b = 2, f(n) = n. The watershed function is n^(log_b a) = n^(log₂ 2) = n^1 = n. Since f(n) = Θ(n) matches the watershed exactly, we're in Case 2. Result: T(n) = Θ(n log n). No surprises.

Space cost: the merge step writes into a new output array at each level. At any point in the recursion only one path from root to leaf is active, and the largest output array at any level is size n at the root. Merge sort needs O(n) auxiliary space. For sorted linked lists, this drops to O(log n) (just the call stack), because merging two sorted lists requires only pointer rewiring.

Heap Sort: Build the Heap, Then Drain It

Heap sort runs in two distinct phases.

Phase 1 builds a max-heap in-place. A max-heap is a complete binary tree stored as an array where every parent is larger than its children. Index i has children at 2i+1 and 2i+2, and parent at ⌊(i-1)/2⌋. To build it, call sift-down on every non-leaf node, starting from the last non-leaf (index ⌊n/2⌋ - 1) down to the root.

Phase 2 extracts the maximum repeatedly. Swap arr[0] (the maximum) with arr[n-1], shrink the heap boundary by one, then sift-down from the root to restore the heap property. After n-1 extractions, the array is sorted in ascending order.

def heap_sort(arr):
    n = len(arr)
    for i in range(n // 2 - 1, -1, -1):  # build max-heap bottom-up
        sift_down(arr, n, i)
    for end in range(n - 1, 0, -1):       # drain: swap max to back, re-heapify
        arr[0], arr[end] = arr[end], arr[0]
        sift_down(arr, end, 0)

def sift_down(arr, n, i):
    while True:
        largest, left, right = i, 2*i+1, 2*i+2
        if left < n and arr[left] > arr[largest]:
            largest = left
        if right < n and arr[right] > arr[largest]:
            largest = right
        if largest == i:
            break
        arr[i], arr[largest] = arr[largest], arr[i]
        i = largest

Max-heap shown as both a flat array [9,8,7,5,3,6,4] with index labels and as its equivalent binary tree, with parent/child index formulas annotated The array and the tree are the same data structure. The index arithmetic (2i+1, 2i+2) is the only difference. That arithmetic is also the source of all your cache problems.

Total time: O(n) for build-heap (see below) plus O(n log n) for n sift-down extractions = Θ(n log n) in all cases. No input permutation helps it. Heap sort is not adaptive.

Build-Heap Is O(n), Not O(n log n)

The intuitive guess is wrong. Sift-down each of n nodes at O(log n) each: O(n log n). But this overcounts badly.

Half the nodes are leaves. They need zero sift-down work. A quarter sit at height 1 and need at most one swap. An eighth are at height 2 with at most two swaps. The cheap work dominates.

Formally: there are at most ⌈n / 2^(h+1)⌉ nodes at height h, and sifting one down takes at most h comparisons. Total work across all heights:

T_build ≤  Σ_{h=0}^{⌊log n⌋}  ⌈n / 2^(h+1)⌉ × h

         ≤  n × Σ_{h=0}^{∞}  h / 2^h

         =  n × 2                  (closed form below)

         =  O(n)

The sum uses the identity Σ_{h=0}^{∞} h × x^h = x / (1 − x)², which converges to 2 when x = 1/2. Build-heap is Θ(n), not Θ(n log n). Robert Floyd established this in 1964, turning heap sort's build phase from a slow start into a freebie. The O(n log n) total comes entirely from the extraction phase.

Bar chart showing node count and sift-down work per level in a 16-element heap. Leaves (h=0): 8 nodes, 0 swaps each. Root (h=4): 1 node, 4 swaps. Total work area is bottom-heavy, summing to O(n). The distribution is bottom-heavy: the levels with the most nodes do the least work. That's the geometric series argument made visible. Floyd knew.

The O(1) Space Has a Hidden Cost

Here's where heap sort's one advantage quietly destroys its own performance on anything built after 2005.

Merge sort reads memory sequentially. During the merge step, two pointers walk two contiguous arrays from front to back. Modern CPUs detect this stride pattern and prefetch the next cache line before you need it. Each 64-byte cache line holds 16 int32 values. Merge sort gets 16 reads for the price of one cache miss.

Heap sort jumps. When sifting down from index i, the code checks indices 2i+1 and 2i+2. For a 1M-element heap, those indices point to memory addresses that are megabytes apart. Every node in the sift-down path is a cache miss.

Once the heap exceeds the L1 cache (typically 32KB, about 8,000 int32s), every sift-down comparison is a fresh L1 miss. Build-heap and extraction together run roughly 2n sift-down calls, each visiting O(log n) nodes. Total cache misses: Θ(n log n). Merge sort's cache misses scale closer to Θ(n) because sequential access lets the prefetcher do its job. The empirical result: merge sort outperforms heap sort by a factor of 2 to 4 on large inputs, on any CPU made in the last decade.

This is exactly why GCC and Clang's std::sort is not heap sort. They use introsort: quicksort first, then switch to heap sort only when recursion depth exceeds 2 × log₂(n) to guarantee O(n log n) worst-case. Heap sort is the safety net, not the primary tool.

Side-by-side access timeline comparison: merge sort shows sequential green dots (one cache miss loads 16 elements), heap sort shows scattered red dots with exponentially growing jump distances (one miss per access, prefetcher gives up) Same asymptotic complexity. Wildly different cache behavior. Merge sort's prefetcher is happy. Heap sort's prefetcher filed for early retirement.

Heap Sort Cannot Be Made Stable

Heap sort is not stable, and no implementation trick fixes this without extra space. When two equal elements sit at different positions in the heap, the one that gets promoted first depends on heap topology: which sibling is larger, not which came earlier in the input. To recover stability you'd need to augment every element with its original index and compare that as a tiebreaker. Cost: O(n) extra space. At that point you've surrendered the only thing heap sort had going for it.

Merge sort is stable by default. The <= comparison in the merge step guarantees the left element (earlier in input order) wins every tie. No augmentation required. This matters for multi-key sorting: sort by secondary key first, then by primary key with a stable sort, and the secondary order is preserved within each primary group. An unstable sort silently scrambles the secondary order and gives you a bug that only shows up in production, on a Friday, with data you weren't expecting.

If your comparison function is expensive (say, sorting records by a slow string comparison), merge sort's smaller constant factor and better cache behavior widen the gap further.

Where These Algorithms Actually Live

Python's sorted() and Java's Arrays.sort(Object[]) both use Timsort, a merge sort variant that detects existing runs in the data and merges them. A nearly-sorted array of a million elements sorts in close to O(n) time. Heap sort would churn through the full Θ(n log n) regardless. Timsort does not care about your clever worst-case proof.

Sorting a linked list is merge sort's natural territory. Find the midpoint with slow and fast pointers, merge two sorted lists by rewiring next pointers, and you need no auxiliary array at all. The only extra memory is O(log n) stack frames for the recursion depth. Heap sort requires converting the list to an array first.

External sorting (when data doesn't fit in RAM and lives on disk) has always been merge sort territory. The algorithm's sequential access patterns translate directly to sequential disk reads. Even a slow spinning disk can sustain sequential reads at hundreds of MB/s. Random access patterns, which heap sort requires, drop throughput to a few MB/s. On a disk, "cache miss" costs milliseconds instead of nanoseconds. Heap sort on disk is a serious bad time.

Heap sort owns one niche: guaranteed O(n log n) worst-case, O(1) auxiliary space, no recursion. On embedded systems with kilobytes of RAM, that matters. It's also the only comparison sort that guarantees O(n log n) worst-case without extra memory, which is why it earns its place in introsort as the backstop that prevents quicksort from going quadratic.

Decision flowchart: Need stable sort? yes → merge sort. Linked list input? yes → merge sort. Hard O(1) space limit? yes → heap sort. General case → introsort/Timsort. Side note: Python, Java, C++ stdlib defaults shown. Most of the time, you end up at "introsort/Timsort" and never have to think about this again. But when you hit the edge cases, this flowchart is what keeps you from picking the wrong tool.

Summary

Both algorithms run Θ(n log n) in every case. Neither is adaptive to partially sorted input by default. (Timsort, a merge sort variant, is.)
Merge sort needs O(n) auxiliary space. Heap sort needs O(1).
Merge sort is stable. Heap sort is not and cannot be without O(n) extra space.
Merge sort is 2 to 4 times faster in practice, because sequential access patterns let the CPU prefetch correctly and heap sort's random jumps do not.
Build-heap runs in Θ(n) via the geometric series argument. The O(n log n) bound comes entirely from the extraction phase.
Heap sort lives in introsort as a worst-case guarantee and on memory-constrained embedded targets.

For linked lists, external data, and any case where stability matters, choose merge sort. For O(1)-space guarantees with no recursion, heap sort earns its spot.

Knowing the answer at your desk and explaining it clearly on the spot are different skills. If you want to rehearse that explanation under interview pressure, SpaceComplexity runs realistic voice-based mock interviews with rubric-based feedback.

For related context on sorting comparisons and complexity analysis, see the merge sort vs quicksort deep dive, the stable sort explainer, and the heap sort complexity breakdown.