Comparison Sort Lower Bound: O(n log n) Is the Fastest You Can Sort. Here's Why.

You can write the cleverest sorting algorithm in the world. Optimize cache access, squeeze constants, parallelize, add SIMD intrinsics, rename variables to single letters because it feels faster. None of it matters. If your algorithm compares elements to decide their order, you cannot sort n items in fewer than O(n log n) comparisons. Not on average. Not in the best case. Not with a lucky seed. Never.

Someone, somewhere, is currently convinced they found an O(n) comparison sort. They will submit it to a forum and get ratio'd by a mathematician. This post explains exactly why they're wrong, with a proof short enough to fit on a whiteboard and rigorous enough to deliver with confidence in an interview.

What a Comparison Sort Is Actually Allowed to Do

A comparison sort determines element order exclusively by comparing pairs: is a[i] less than a[j]? Yes or no. Merge sort, quicksort, heapsort, bubble sort, insertion sort. All comparison sorts.

The comparison model restricts what the algorithm is allowed to know about the data. It cannot use element values as indices. It cannot hash them. It cannot read them as addresses, primes, or Unicode folklore. It can only ask: which of these two is bigger?

Imagine sorting a stack of cards while blindfolded, holding two at a time, asking "is this one heavier?" That's the entire model. This restriction is exactly what makes the lower bound provable. No restriction, no proof. No proof, no claim. The constraint is load-bearing.

The Decision Tree That Proves You're Trapped

Trace every comparison an algorithm makes on a particular input. At each step, it asks "is a[i] less than a[j]?" and branches left if yes, right if no. This produces a binary tree: one branch per comparison, until the algorithm reaches a leaf that names a sorted order.

Every possible input ordering must reach a different leaf. If two distinct permutations reached the same leaf, the algorithm made identical choices for both and cannot have sorted both correctly. So the tree must have at least one leaf per possible input permutation.

For n elements, there are n! possible permutations. At least n! leaves.

A binary tree of height h has at most 2^h leaves. So:

2^h >= n!
h >= log₂(n!)

The height is the worst-case comparison count. The lower bound on sorting time is log₂(n!). The tree cannot be shallower. No pivot trick, no cache trick, no rebranding of bubble sort as "adjacent comparison sort" gets around this.

Decision tree for sorting 3 elements, showing binary branching to 6 leaves

Every input permutation needs its own leaf. With n! permutations, the tree must be this deep.

Three Elements, Six Orderings, Zero Escape Routes

For three elements [a, b, c], there are 3! = 6 possible orderings. The decision tree needs at least 6 leaves.

A binary tree with 6 leaves has height at least 3, since 2² = 4 is too few but 2³ = 8 covers it. Any comparison sort for 3 elements needs at least 3 comparisons in the worst case. Try to shortcut it and you get two permutations you cannot distinguish. The algorithm makes the same choices for both. One of them gets the wrong answer.

Insertion sort on 3 elements uses exactly 3 comparisons worst case. Optimal for n = 3. As n grows, the minimum number of comparisons grows with it. It grows as n log n.

log₂(n!) Is Actually Θ(n log n)

This step converts the bound from "log of a factorial" into something you can actually compare against algorithm runtimes.

Throw away the first half of the sum. The remaining n/2 terms are all at least log₂(n/2):

log₂(n!) = log₂(1) + log₂(2) + ... + log₂(n)
         >= log₂(n/2) + ... + log₂(n)     [drop smaller first half]
         >= (n/2) · log₂(n/2)
          = (n/2) · (log₂(n) - 1)
          = Ω(n log n)

Every comparison sort must make at least Ω(n log n) comparisons in the worst case. No exceptions. Stirling's approximation fills in the other direction: log₂(n!) = n log₂(n) - n log₂(e) + O(log n), which is also O(n log n). The bound is tight. You cannot go below, and it is actually achievable.

Who Actually Achieves the Bound?

Tight means achievable, not just a floor that nothing reaches.

Merge sort always does O(n log n) comparisons, worst case and average. It achieves the lower bound unconditionally. No bad inputs, no bad days. You could hand merge sort the most adversarial array you can construct and it would still perform on-bound.

Heapsort is O(n log n) worst case. Same theoretical guarantee, somewhat worse cache behavior in practice, because heapsort was apparently designed without regard for spatial locality.

Quicksort is the complicated one. O(n log n) on average, O(n²) worst case. Hand it a sorted array with naive pivot selection and watch it degrade spectacularly. With randomized pivot selection it hits O(n log n) with high probability, which in practice means it's fine. It just comes with an asterisk that merge sort does not.

Merge sort and heapsort are the genuinely optimal comparison sorts. You can battle over constants and cache performance. The n log n shape is not negotiable.

The Escape Hatch (There Is One. Read the Fine Print.)

The lower bound proof has one critical assumption: the algorithm learns order exclusively through comparisons. Remove that assumption and the bound goes away entirely.

Counting sort uses element values as array indices. Count occurrences, reconstruct output. No comparisons. O(n + k) where k is the value range. For small k, that is linear time.

Radix sort sorts numbers digit by digit using counting sort at each pass. O(d(n + b)) where d is the digit count and b is the base. For fixed-width integers, O(n).

Bucket sort distributes elements into buckets by value range, then sorts each bucket. O(n) when elements are uniformly distributed.

None of these are magic. Each one trades a constraint on the data for speed. Counting sort needs integers in a bounded range. Radix sort needs fixed-length keys. Bucket sort needs a predictable distribution. If your data satisfies those constraints, you can beat O(n log n). If it doesn't, you cannot.

This is why "I can sort in O(n)" deserves a follow-up question before any celebration. What are you assuming about the data? The escape hatch is real. The fine print is load-bearing.

Why This Comes Up in Every Serious Interview

The trap answer to "what's the time complexity of sorting?" is "O(n log n)." That's correct in a narrow sense. The complete answer is: "O(n log n) for comparison sorts, by the decision tree lower bound. With additional constraints on the data, counting sort or radix sort can get us to linear time."

That addition signals you know where the bound comes from, not just what it is. Interviewers writing feedback can tell the difference. One produces a one-line debrief. The other produces a paragraph with language like "candidate explained the information-theoretic argument without prompting."

A follow-up you might face: "Could we sort n integers in [1, n²] in O(n) time?" Yes. Radix sort with base n handles it in two passes, each O(n). The range is bounded. The escape hatch applies.

Or: "Why can't we sort arbitrary objects faster than O(n log n)?" Because arbitrary objects can only be compared. No numerical structure to exploit as indices. No bounded range to count into. The decision tree argument applies in full. n! orderings need n! leaves. n! leaves need height log₂(n!). Stirling says that is Θ(n log n).

Every Complexity Claim Reads Differently After This

Any O(n log n) comparison sort is optimal. Any comparison sort claiming O(n) is wrong unless it is hiding a comparison somewhere. Any non-comparison sort claiming O(n) carries a hidden constraint on the data worth surfacing.

The sorting algorithms cheat sheet lists time and space for every common algorithm. Cross-reference those numbers against the lower bound: merge sort and heapsort sit exactly at it, quicksort sits there on average, and the O(n) sorts all carry data constraints that make the bound inapplicable.

The same logic appears elsewhere. Binary search's O(log n) is optimal for ordered search by a near-identical argument. You start with n possible positions. Each comparison eliminates half. You need log₂(n) comparisons. Lower bound, achieved, done. The pattern is log applied to the number of possibilities, and once you see it, you see it everywhere.

The One-Line Version

Each comparison halves your uncertainty about the correct permutation. You start with n! possibilities. Each yes/no question eliminates half. You need at least log₂(n!) questions, and log₂(n!) is Θ(n log n).

Decision tree, n! leaves, height log₂(n!), Stirling says Θ(n log n). Deliver that chain out loud without hesitation and you've answered the question better than most candidates will. The argument is short, clean, and fits on the whiteboard with room to spare.

If you want to practice delivering proofs like this under actual interview conditions, where an interviewer is watching and waiting for you to fumble, SpaceComplexity runs voice-based mock interviews where an AI pushes back on your reasoning the way a real interviewer would. The gap between knowing an argument on paper and delivering it under pressure is larger than it looks.