Quicksort vs Merge Sort: The Tradeoffs Behind the Same O(n log n)

- Structural inverses: merge sort puts all work in the combine step; quicksort puts it in partition, where combine is free
- Space: merge sort requires an O(n) auxiliary buffer; quicksort averages O(log n) call stack depth
- Worst case: merge sort guarantees O(n log n) on every input; quicksort degrades to O(n²) on sorted data with a fixed pivot
- Stability: merge sort is stable, quicksort is not, which is why Java uses Timsort for objects and dual-pivot quicksort for primitives
- Real-world speed: quicksort's sequential partition access beats merge sort's three-region buffer thrashing by 2 to 3x on real hardware
- Production sort: introsort starts as quicksort and falls back to heapsort at depth 2 log n, giving guaranteed O(n log n) worst case
- Decision rule: choose merge sort for linked lists, multi-key stability, or external sorts; quicksort for everything else
Textbooks list them together. Both are divide-and-conquer, both average O(n log n), both among the most studied algorithms ever written. Then Java ships with two different sorting functions: one for integers and one for objects, and they use completely different algorithms. That is not a quirk. That is a confession.
The two algorithms look similar from a distance. Up close they are structural inverses. Understanding the quicksort vs merge sort tradeoffs makes the choice obvious every time it comes up.
They Are Structural Inverses of Each Other
Both algorithms split an array, sort the pieces, and combine them. Where they put the work is completely different.
Merge sort puts all the work in the combine step. Splitting is trivial: cut at the midpoint. The hard part is merging two sorted halves back into one, which requires an O(n) auxiliary buffer and linear passes through both halves.
Quicksort is the mirror image. The combine step costs nothing. Once you partition around a pivot, the pivot lands at its final sorted position and the two sides never interact again. All the work happens in the partition step.
# Quicksort: all work at partition, combine is free def quicksort(arr, lo, hi): if lo >= hi: return p = partition(arr, lo, hi) # pivot lands at its final index p quicksort(arr, lo, p - 1) quicksort(arr, p + 1, hi) # Merge sort: divide is free, all work at merge def mergesort(arr, lo, hi): if lo >= hi: return mid = (lo + hi) // 2 mergesort(arr, lo, mid) mergesort(arr, mid + 1, hi) merge(arr, lo, mid, hi) # O(n) buffer required here
The structural consequence: quicksort sorts in-place, merge sort does not.
The Partition Step Does All of Quicksort's Heavy Lifting
Quicksort's core operation picks a pivot and rearranges the array so every element smaller than the pivot ends up left of it, every larger element ends up right, and the pivot sits exactly where it belongs in the final sorted order. It never moves again.
Trace [5, 3, 1, 4, 2] with pivot = last element (Lomuto partition):
[5, 3, 1, 4, 2] pivot = 2
Scan left to right, push elements ≤ 2 into left zone:
5 > 2, skip
3 > 2, skip
1 ≤ 2, swap with boundary → [1, 3, 5, 4, 2]
4 > 2, skip
Place pivot at boundary + 1: [1, 2, 5, 4, 3]
↑ pivot at index 1, done forever
Left: [1] → base case
Right: [5, 4, 3] → pivot = 3
After partitioning right: [1, 2, 3, 4, 5]
Two recursive calls, no merging. The sorted order emerges entirely from pivot placements.
Merge sort: split is free, merge is expensive. Quicksort: partition is expensive, combine is free. The worst case (sorted input + last-element pivot) degenerates to a linear chain.
Where O(n log n) Comes From, and Where It Breaks Down
Merge sort's recurrence is the cleanest in algorithms. Every call splits at the midpoint, so both halves are exactly n/2, and the merge step costs Θ(n):
T(n) = 2T(n/2) + Θ(n)
Apply the Master Theorem: a = 2, b = 2, f(n) = n.
n^(log_b a) = n^(log₂ 2) = n^1 = n
f(n) = Θ(n^(log_b a)) → Case 2 → T(n) = Θ(n log n)
This bound holds on every input, always. No randomness, no best-case caveat. The midpoint split guarantees equal subproblems.
Quicksort's worst case is a sorted array with pivot chosen as the last element. The pivot always lands at one end. The data is already done. The algorithm does not notice.
T(n) = T(n-1) + T(0) + Θ(n) = T(n-1) + Θ(n)
= Θ(n) + Θ(n-1) + ... + Θ(1) = Θ(n²)
The average case uses indicator random variables. Label elements by rank z₁ < z₂ < ... < zₙ. Two elements zᵢ and zⱼ (i < j) are compared if and only if one of them is the first element from the range {zᵢ, zᵢ₊₁, ..., zⱼ} chosen as pivot. If any element between them is chosen first, they end up in different partitions and never meet. Each of the j − i + 1 elements is equally likely to be first, so:
Pr[zᵢ and zⱼ compared] = 2 / (j − i + 1)
E[comparisons] = Σᵢ<ⱼ 2/(j − i + 1) = 2(n+1)Hₙ − 4n ≈ 2n ln n
Where Hₙ is the nth harmonic number. Randomized quicksort does roughly 1.39 × n log₂ n comparisons on average. Merge sort does about n log₂ n. Quicksort does more comparisons but, as we will see, far fewer memory operations.
At n=1,000, the average cases are similar. The worst case is not. The sorted-input trap makes the O(n²) path very easy to stumble into with a naive implementation.
For more on solving recurrences and reading recursion trees, see [/blog/recursion-time-complexity].
Merge Sort Costs O(n) Space. Quicksort Does Not.
Merge sort's buffer is unavoidable. You cannot merge two sorted halves in-place without either O(n) space or an O(n log² n) time penalty. Any implementation that claims to be both in-place and O(n log n) stable is either not in-place or not O(n log n). If someone shows you one, check the fine print.
Quicksort's auxiliary space is just the call stack. Average depth is O(log n), so O(log n) space. The worst case is O(n) deep recursion on sorted input. Sedgewick's fix eliminates this: always recurse on the smaller partition first, then loop (or tail-call) on the larger. This guarantees O(log n) stack depth regardless of how bad the pivots are.
One exception inverts the conclusion. Merge sort on a linked list uses O(1) auxiliary space. Merging two sorted linked lists is just pointer rewiring, no buffer needed. Quicksort on a linked list is slow because partition requires touching elements by index, which costs O(n) per access. This is why LeetCode 148 (Sort List) has a standard O(n log n) O(1)-space merge sort solution, and quicksort would be awkward.
Stability Is Not Optional for Multi-Key Sorts
Merge sort is stable. Quicksort is not. Equal elements preserve their original relative order after merge sort. The merge step's tiebreaker (left-before-right) guarantees this. The <= in the merge condition is load-bearing.
if left[i] <= right[j]: # <= keeps equal left elements before right: stable arr[k] = left[i]
Quicksort's partition moves elements past each other without tracking original position. Equal elements can end up in any relative order.
Sort 10,000 records by salary, then by department. If the department sort is unstable, the salary ordering within each department gets scrambled. This exact scenario is why Java ships two algorithms.
This matters whenever you sort by multiple keys sequentially. Sort 10,000 employee records first by salary, then by department. The second sort must be stable, or the salary ordering within each department is scrambled. Java solves this by shipping two distinct algorithms: Arrays.sort(int[]) uses dual-pivot quicksort, and Arrays.sort(Object[]) uses Timsort. The type determines which property matters.
Stability is explored in full at [/blog/stable-sort-algorithm].
Why Quicksort Usually Wins on Real Hardware
Both algorithms do Θ(n log n) work. Quicksort is typically 2 to 3 times faster in practice. The reason is memory, not comparisons.
Quicksort's partition step accesses memory sequentially. Two indices scan toward each other across a contiguous array. Elements sit within a few cache lines of the next element the algorithm will touch. The hardware prefetcher detects the stride and pre-loads the next cache line before it is needed. It is very good at this. Your code rarely deserves the prefetcher's loyalty.
Merge sort's combine step touches three memory regions: two input halves and one output buffer. If the array is large enough that these regions span multiple cache lines (which happens well before n = 10,000), the processor has to load from DRAM repeatedly. L1 cache latency is around 1 ns. DRAM is around 80 ns. Merge sort's extra buffer forces more DRAM traffic.
Quicksort's sequential scan is a gift to the CPU's prefetcher. Merge sort's three-region pattern means more cache misses, especially once the array grows past L2 cache.
There is also the copy overhead. Merge sort writes every element to the auxiliary buffer and back. Quicksort only swaps elements that need to move. On sorted or nearly-sorted input (common in real workloads), quicksort's total swap count collapses to near zero, while merge sort copies everything regardless.
See [/blog/array-vs-linked-list-performance] for the cache hierarchy numbers and why sequential access dominates real-world runtime.
What Standard Libraries Actually Chose
Production sort implementations are hybrids, not textbook algorithms.
| Runtime | Primitive sort | Stable sort |
|---|---|---|
| Java | dual-pivot quicksort (Yaroslavskiy, 2009) | Timsort |
| C++ std::sort | introsort (Musser, 1997) | N/A |
| C++ std::stable_sort | N/A | merge sort-based |
| Python sorted() | N/A | Timsort |
| Go sort.Sort() | pdqsort | N/A |
Introsort is the dominant production variant for unstable sort. It starts as quicksort, counts recursion depth, and falls back to heapsort once depth exceeds 2 log₂ n. Small subarrays (typically n < 16) finish with insertion sort. The result is guaranteed O(n log n) worst case with quicksort's average-case cache behavior.
Dual-pivot quicksort uses two pivots, creating three partitions instead of two. Yaroslavskiy's 2009 implementation cuts comparisons from approximately 2n ln n to 1.9n ln n, about 5% fewer, with better cache behavior from the three-way partition pattern. Oracle adopted it as Java 7's default primitive sort in 2011.
Nobody uses textbook quicksort or textbook merge sort in a real sort function. Not because the textbook versions are wrong. They just leave 30% performance on the table for no good reason.
Quicksort vs Merge Sort: Which Do You Choose?
Most of the time, your language's standard library already made the right call. The question comes up when you are implementing a sort yourself, choosing between std::sort and std::stable_sort in C++, or debugging unexpected ordering after sorting a collection by multiple fields.
Four properties drive the decision:
- Need stability (multi-key sorts, sort-by-multiple-criteria): merge sort
- Sorting a linked list (no random access): merge sort
- External sorting, data larger than RAM (sequential disk I/O): merge sort
- In-memory sorting of primitive arrays: quicksort or introsort
If none of those constraints apply, quicksort will be faster.
Recap
- Both algorithms are O(n log n) average. Only merge sort guarantees it in the worst case.
- Merge sort: trivial divide, expensive merge, O(n) space, stable.
- Quicksort: expensive partition, trivial combine, O(log n) space average, unstable.
- Quicksort's sequential memory access beats merge sort's buffer thrashing on real hardware.
- Linked lists favor merge sort because merging needs only pointer rewiring, no buffer.
- Production implementations use introsort or dual-pivot quicksort, not textbook quicksort.
- Choose merge sort for stability, linked lists, or disk-based data. Quicksort for everything else.
Knowing the tradeoff on paper is one thing. Explaining it out loud to an interviewer who is probing whether you understand why, not just what, is another. SpaceComplexity runs voice-based DSA mock interviews and scores your reasoning in real time, so you find out where your explanation breaks down before it costs you an offer.