Quickselect Algorithm: Find Any Order Statistic in O(n) Without Sorting

May 26, 202627 min read
dsaalgorithmsinterview-prepdata-structures
Quickselect Algorithm: Find Any Order Statistic in O(n) Without Sorting
TL;DR
  • Quickselect finds the kth order statistic in O(n) expected time by partitioning and recursing into one subarray, not two
  • Random pivot selection drops worst-case O(n²) probability to effectively zero; fixed pivots on sorted input are the classic failure mode
  • Three-way partition (Dutch National Flag) is required for arrays with many duplicates, otherwise all-equal inputs degrade to O(n²)
  • Median of medians (BFPRT, 1973) guarantees O(n) worst case but carries ~6× larger constants; introselect uses it only as a fallback
  • Iterative quickselect uses O(1) space because the algorithm is always a tail call — just update lo or hi in a loop
  • std::nth_element and numpy.partition both ship introselect; prefer them over a hand-rolled implementation in production

You have a million numbers and you want the median. The obvious move: sort, then index the middle. That works. But you just did O(n log n) work when the answer needed O(n). The sorted positions of every other element were discarded the moment you read arr[n/2].

The quickselect algorithm is the fix. Same partition trick as Quicksort, but instead of recursing into both halves, you throw away the half that cannot contain your target. One pass narrows the search. The next pass narrows it again. The work at each level forms a geometric series that sums to O(n).

Tony Hoare published it in July 1961 as "Algorithm 65: Find" in Communications of the ACM, the same issue as Quicksort (Algorithm 64) and Partition (Algorithm 63). It has been in your standard library ever since. std::nth_element in C++ and numpy.partition in Python both run it.

Reach for Quickselect when you need the kth order statistic and sorting is overkill. Kth smallest, top-k, approximate median, percentiles, streaming statistics: if you can afford to rearrange the array, this is your O(n) path.


Gentleman frog in a suit: "Gentlemen, it is with great pleasure to inform you that I have implemented a sorting algorithm in production today."

You sorted. You only needed one rank. The frog approves. Quickselect would not have.


What Quickselect Does to Your Array

Quickselect is in-place. It rearranges the input and returns the element at position k (0-indexed).

When it finishes, arr[k] holds the element that would appear at index k in the sorted version. Everything in arr[0..k-1] is ≤ arr[k]. Everything in arr[k+1..n-1] is ≥ arr[k]. The order within those two regions is unspecified.

Input:  [7, 2, 4, 1, 6, 3, 5]    k = 2 (3rd smallest, 0-indexed)
After:  [1, 2, 3, 4, 7, 6, 5]
               ^  k=2

[1, 2] ≤ 3 ≤ [4, 7, 6, 5]   ← only this partition property is guaranteed

The partial-order guarantee is exactly what makes it fast. You never commit to sorting either half.


One Partition Step Settles the Pivot's Rank Forever

The core operation: pick a pivot, rearrange the subarray so all elements smaller than the pivot are on its left and all elements larger are on its right. The pivot lands at its exact rank in the sorted order.

After partition, the pivot's final index is its exact rank. This is not an approximation. If the pivot lands at index p, then arr[p] is provably the (p+1)th smallest element in the subarray.

Array:  [3, 6, 1, 5, 2, 4]
                          ↑ choose 4 as pivot

After partition:
  [ 3, 1, 2 | 4 | 6, 5 ]
    -------       -----
    all ≤ 4       all ≥ 4
    pivot is permanently at index 3

Quicksort recurses into BOTH sides. Quickselect compares the pivot's index p to k:

  • p == k: done. arr[p] is the answer.
  • p < k: the answer is in the right subarray. Discard everything at or left of p.
  • p > k: the answer is in the left subarray. Discard everything at or right of p.

The discarded half is gone. No information about the answer is lost because the pivot's rank is exact.

Quickselect partition step: before and after, showing pivot landing at its exact rank with elements less-than on the left and greater-than on the right

After one partition, the pivot is home. It never moves again.

                    [3, 6, 1, 5, 2, 4]   k=2
                           |
                   partition, pivot=4 at p=3
                           |
              k < p, so recurse left only
                           ↓
                      [3, 1, 2]   k=2
                           |
                  partition, pivot=2 at p=1
                           |
              k > p, so recurse right only
                           ↓
                        [3]   k=2 (single element)
                           |
                      return 3 ✓

Three Passes in Action

A concrete trace. We want the 3rd smallest (k=2) from [3, 6, 1, 5, 2, 4]. Sorted order is [1, 2, 3, 4, 5, 6] so the answer is 3.

Pass 1. Random pivot = 4 (value), currently at index 5. Lomuto partition with lo=0, hi=5:

Initial:    [3, 6, 1, 5, 2, 4]   pivot = arr[5] = 4
            Scan left-to-right, swap elements ≤ 4 to the front:
            3 ≤ 4 → move  → [3, ...]  i=1
            6 > 4 → skip
            1 ≤ 4 → move  → [3, 1, ...] i=2
            5 > 4 → skip
            2 ≤ 4 → move  → [3, 1, 2, ...] i=3
            Final: swap arr[3] ↔ arr[5]
Result:     [3, 1, 2, 4, 6, 5]   pivot at p=3

p=3 > k=2, so recurse left into arr[0..2] = [3, 1, 2].

Pass 2. Random pivot = 2, at index 2 within [3, 1, 2]:

Partition [3, 1, 2] with pivot=2 (arr[hi]):
  3 > 2 → skip
  1 ≤ 2 → move  → [1, 3, ...] i=1
  Final: swap arr[1] ↔ arr[2]
Result:  [1, 2, 3]   pivot at p=1 (global)

p=1 < k=2, so lo = 2. Now lo == hi == 2. Loop exits.

Return arr[2] = 3. Correct.


Lomuto or Hoare? Two Partitions, Same Goal

Two standard implementations. Both correct. Different tradeoffs.

Lomuto (clearer, slightly more swaps):

def partition(arr, lo, hi): # Random pivot: swap a random element to arr[hi] first pivot_idx = random.randint(lo, hi) arr[pivot_idx], arr[hi] = arr[hi], arr[pivot_idx] pivot = arr[hi] i = lo for j in range(lo, hi): if arr[j] <= pivot: arr[i], arr[j] = arr[j], arr[i] i += 1 arr[i], arr[hi] = arr[hi], arr[i] return i # ← pivot is now exactly at index i

After the call, the pivot IS at index i. Use it directly as the pivot's rank in quickselect.

Hoare (roughly 3× fewer swaps, trickier correctness):

def hoare_partition(arr, lo, hi): pivot = arr[lo] i, j = lo - 1, hi + 1 while True: i += 1 while arr[i] < pivot: i += 1 j -= 1 while arr[j] > pivot: j -= 1 if i >= j: return j # ← split point, NOT the pivot's index arr[i], arr[j] = arr[j], arr[i]

Hoare returns a split point j, not the pivot's index. arr[lo..j] ≤ pivot and arr[j+1..hi] ≥ pivot, but the pivot could be anywhere in the left half. With Hoare, you recurse on [lo, j] and [j+1, hi], never on [lo, j-1]. Using j-1 is the most common off-by-one bug in quickselect implementations. It looks right. It is wrong. It will haunt you.

For the implementations below, Lomuto is used: the pivot's exact rank is immediately available, the "found it" early-exit (p == k) is unambiguous, and random pivot selection works cleanly.

One rule regardless of partition scheme: always choose the pivot randomly by swapping a random index to the end before partitioning. Without this, a sorted array with a fixed last-element pivot hits O(n²) every time.


Why O(n)? The Geometric Series Argument

Call a pivot "good" if it lands in the middle 50% of values in the current subarray, meaning its rank falls between n/4 and 3n/4. A random pivot is good with probability ≥ 1/2. The expected number of pivots needed before getting a good one is 2 (geometric distribution with p=1/2).

When you get a good pivot, the next subproblem is at most 3n/4. So the expected work per level is:

Level 0: O(n)          ← full partition scan
Level 1: O(3n/4)       ← good pivot guaranteed at most 3n/4
Level 2: O(9n/16)
Level 3: O(27n/64)
...

Sum: O(n) × (1 + 3/4 + 9/16 + ...) = O(n) × 1/(1 − 3/4) = O(4n) = O(n).

The geometric series converges because each level shrinks the problem by at least a constant factor. Quicksort recurses into BOTH halves, so the total work per level stays O(n) and you get O(n log n). Quickselect recurses into ONE half, so the levels converge.

Quickselect vs Quicksort recursion: Quicksort expands into a full binary tree touching O(n) per level for O(log n) levels; Quickselect follows a single path, each step shrinking by 3/4, giving a geometric series summing to O(n)

Quicksort is a full binary tree touching O(n) at every level. Quickselect is a single path. That's the whole difference.

Geometric series bar chart: each level shrinks to at most 3/4 of the previous, total work sums to 4n

Each bar is one recursion level. They shrink by 3/4 every time. The series converges to 4n.

The CLRS formal analysis (RANDOMIZED-SELECT, Chapter 9.2) confirms this: the expected number of comparisons is at most 4n. The exact formula for the median case (k=n/2) is approximately 3.3863n for large n.


When the Pivot Betrays You: O(n²) Worst Case

The worst case: the pivot is always the minimum or maximum of the current subarray.

T(n) = T(n-1) + O(n)
     = O(n) + O(n-1) + O(n-2) + ... + O(1)
     = O(n²)

This happens with:

  • Fixed first-element pivot on an already-sorted array
  • Fixed last-element pivot (Lomuto) on a reverse-sorted array
  • Median-of-3 pivot on Musser's adversarial "killer" sequences (designed specifically to defeat median-of-3 selection)

With a random pivot, the probability of O(n²) behavior is O(1/n!), effectively zero in practice. An adversary would need to know your random seed to construct a bad input. And if your adversary knows your random seed, you have larger problems than sorting.

All-equal arrays are a separate failure mode covered below.


Duplicates Will Destroy the Basic Version

An array of all-equal elements hits O(n²) with standard two-way partition. Every element compares equal to the pivot. Lomuto puts all elements to the left (since arr[j] <= pivot is always true), so the pivot lands at position n-1. Recurrence: T(n) = T(n-1) + O(n) = O(n²).

The fix is three-way partition, Dijkstra's Dutch National Flag algorithm. It maintains three regions: [less | equal | unexamined | greater].

THREE-WAY-PARTITION(arr, lo, hi):
    pivot = arr[lo]    # or random
    lt, i, gt = lo, lo, hi
    while i <= gt:
        if arr[i] < pivot:   swap arr[lt], arr[i]; lt++; i++
        elif arr[i] > pivot: swap arr[i], arr[gt]; gt--
        else: i++
    # arr[lo..lt-1] < pivot
    # arr[lt..gt] == pivot   ← ALL equal elements placed in one pass
    # arr[gt+1..hi] > pivot
Before: [3, 1, 2, 3, 4, 3, 3]   pivot=3, k=4
         lt=i=0, gt=6
         ...after one pass...
After:  [1, 2, 3, 3, 3, 3, 4]
              ^           ^
              lt=2        gt=5

If k falls within [lt, gt], return immediately. Otherwise recurse into the appropriate smaller-than or greater-than region. For an all-equal array, one pass handles everything: O(n).

Dutch National Flag three-way partition: before and after, showing three colored regions for less-than, equal, and greater-than the pivot

One pass. Three regions. All duplicates handled. The standard two-region partition never stood a chance against [3,3,3,3,3].


The Guaranteed O(n) Fallback: Median of Medians

Randomized quickselect is O(n) expected, not worst-case. The BFPRT algorithm (Blum, Floyd, Pratt, Rivest, Tarjan, 1973) gives a hard O(n) guarantee by choosing a pivot that always splits at least 30-70.

The trick: find a pivot guaranteed to rank between the 30th and 70th percentile, no matter the input.

Step 1: divide into groups of 5
  [3 1 4 1 5] → sorted → median = 3
  [9 2 6 5 3] → sorted → median = 5
  [5 8 9 7 9] → sorted → median = 8
  [3 2 3 8 4] → sorted → median = 3
  [6 2 6 4 3] → sorted → median = 4

Step 2: collect medians → [3, 5, 8, 3, 4]
Step 3: recursively find median of medians → 4

Step 4: use 4 as pivot
  → at least 30% of elements ≤ 4
  → at least 30% of elements ≥ 4
  → partition is at most 70-30

Why groups of 5 specifically? The branching coefficients are 1/5 (recursive call on medians) + 7/10 (worst-case recursive call on partition) = 9/10 < 1. That's what makes the recurrence converge:

T(n) ≤ T(n/5) + T(7n/10 + 6) + O(n) = O(n)

Groups of 3 fail: coefficients become 1/3 + 2/3 = 1, giving O(n log n). Groups of 5 is the minimum size where the geometry works and sorting each group is still O(1).

Median of medians: five groups of five elements, each sorted to extract a median, then those medians are recursively searched for their median, producing a pivot guaranteed between the 30th and 70th percentile

Split into groups of 5. Sort each group. Take the medians. Find the median of those medians. Use it as your pivot. Congratulations, you now have guaranteed O(n) worst case and a constant factor six times worse than the randomized version.

The catch: BFPRT uses roughly 22n comparisons in the worst case, versus ~3.4n expected for randomized quickselect. That ~6× overhead makes it impractical standalone. In practice, the standard library uses introselect: start with randomized quickselect, fall back to a guaranteed-linear algorithm if the recursion depth exceeds O(log n). That's what std::nth_element and numpy.partition implement.


How Each Operation Gets Its Complexity

OperationTime (avg)Time (worst)Space (avg)Space (worst)
kth order statistic (recursive)O(n)O(n²)O(log n)O(n)
kth order statistic (iterative)O(n)O(n²)O(1)O(1)
kth order statistic (BFPRT)O(n)O(n)O(log n)O(log n)
kth order statistic (introselect)O(n)O(n) or O(n log n)*O(log n)O(log n)

*libstdc++ std::nth_element falls back to heapselect (O(n log n) worst case), not pure median-of-medians. numpy claims O(n) worst case. The C++ standard requires only "average linear," not worst-case linear.

Space is O(log n) on average because the recursive call stack depth equals the number of partition rounds. With balanced pivots, that's O(log n) rounds. The iterative version collapses this to O(1) because quickselect is always a tail call: there's only ever one recursive call and it's the last thing done. Just update lo or hi in a loop.

The partition itself uses O(1) extra space. No auxiliary arrays.


One Structure, Every Language

import random def quickselect(nums: list[int], k: int) -> int: """Returns the kth smallest element (0-indexed). Mutates a copy.""" arr = nums[:] lo, hi = 0, len(arr) - 1 while lo < hi: p = _partition(arr, lo, hi) if p == k: return arr[k] elif p < k: lo = p + 1 else: hi = p - 1 return arr[lo] def _partition(arr: list[int], lo: int, hi: int) -> int: pivot_idx = random.randint(lo, hi) arr[pivot_idx], arr[hi] = arr[hi], arr[pivot_idx] pivot = arr[hi] i = lo for j in range(lo, hi): if arr[j] <= pivot: arr[i], arr[j] = arr[j], arr[i] i += 1 arr[i], arr[hi] = arr[hi], arr[i] return i # Find the 3rd smallest (k=2) print(quickselect([7, 2, 4, 1, 6, 3, 5], 2)) # 3 # numpy ships introselect if you need it: # import numpy as np # a = np.array([7, 2, 4, 1, 6, 3, 5]) # np.partition(a, 2)[2] # 3

What Classes of Problems Call for This

Quickselect is the right tool when:

You need a specific rank from an unsorted array. Any percentile: minimum (k=0), maximum (k=n-1), median (k=n/2), 90th percentile (k=0.9n). One rank, one pass.

You need the top-k or bottom-k elements, not sorted. Run quickselect with rank k. After the call, arr[0..k-1] contains the k smallest elements in any order. No sorting needed.

Sorting feels like overkill. The brute force is "sort, then index," and you can see the O(n log n) work being thrown away. Full array, one rank, no output order required: that's the quickselect shape.

Quickselect is the wrong tool when:

  • Multiple queries on the same array: sort once, then answer in O(1) each.
  • Streaming queries after each insertion: use a max-heap and min-heap pair for dynamic median tracking in O(log n) per insertion.
  • The input cannot be mutated: you need an O(n) copy, which is fine asymptotically but worth noting.
  • The array is already sorted: indexing directly is O(1).

Signals That Quickselect Is the Right Tool

Signal 1: The problem asks for a rank and sorting "should work" but feels wasteful.

The problem says "kth largest," "find the median," or "return the top k elements." The input is given all at once. You notice that a sort-then-index approach gives O(n log n), and the problem constraints hint that O(n) is expected. That's the cue.

Signal 2: The brute force is sort() then [k], and you can see that sorted order is thrown away.

If you immediately reach for sort and only read one index, the sorted positions of all other elements are waste. Quickselect eliminates that waste.


Worked Example 1: Kth Largest Element (LeetCode 215)

Problem: given an unsorted array, return the kth largest element. Constraints imply O(n) or O(n log n). No specific sorted output required.

"Kth largest" is "kth smallest from the right." In 0-indexed terms, that's the (n-k)th smallest.

def findKthLargest(nums: list[int], k: int) -> int: return quickselect(nums[:], len(nums) - k)

Expected O(n). The heap approach (heapq.nlargest(k, nums)) runs in O(n log k), which is O(n log n) when k is proportional to n. Quickselect wins for large k.

No sorted output, single rank, full array available at once. Textbook fit.


Worked Example 2: Find the Median of an Unsorted Array

Problem: given n numbers, find the median in O(n) time.

def find_median(nums: list[int]) -> float: n = len(nums) if n % 2 == 1: return float(quickselect(nums[:], n // 2)) else: arr = nums[:] lo = quickselect(arr, n // 2 - 1) hi = quickselect(arr, n // 2) # arr already partially sorted from first call return (lo + hi) / 2.0

For even-length arrays, two calls are needed. Each is O(n), so the total is still O(n). The second call on an already partially sorted array tends to be faster in practice, since the first call left arr[0..n/2-1] ≤ median.

Sorting would cost O(n log n). A min-heap scan costs O(n log n). Quickselect is the one path to O(n).

"Find something about rank in an unsorted array without needing sorted output" almost always lands here.


Recap

  • Quickselect finds the kth order statistic in O(n) expected time by partitioning and recursing into ONE subarray instead of two.
  • After partition, the pivot is at its exact sorted rank. Discarding the other half loses zero information.
  • Average O(n) follows from a geometric series: each level shrinks by at least a constant factor, and the levels sum like 1 + 3/4 + 9/16 + ... = 4.
  • Worst-case O(n²) happens when the pivot is always the min or max. A random pivot makes this probability O(1/n!).
  • The iterative version uses O(1) space: quickselect is always a tail call, so just update lo or hi in a loop.
  • Three-way partition (Dutch National Flag) is required for arrays with many duplicate elements, otherwise all-equal arrays hit O(n²).
  • Median of medians (BFPRT, 1973) guarantees O(n) worst case but with ~6× larger constants. Used as a fallback in introselect.
  • std::nth_element (C++) and numpy.partition both ship introselect. Use them in production.
  • Quickselect mutates the array. Copy first if you need the original preserved.
  • Hoare's scheme does ~3× fewer swaps than Lomuto but requires careful recursion bounds: [lo, j] and [j+1, hi], NOT [lo, j-1].

If you want to practice explaining the geometric series argument under pressure and get real-time feedback on whether your reasoning landed, SpaceComplexity runs voice-based mock interviews with rubric-based scoring on algorithm depth. Worth running through before the actual interview.


The same partition step that makes quickselect fast also makes quicksort fast in different ways. The merge sort vs quicksort comparison walks through why recursing into both halves costs O(n log n) while one half costs O(n). For the alternative approach to top-k problems, the heap data structure explains the O(n log k) path and when you'd prefer it over O(n) quickselect. The "eliminate half the search space" reasoning behind quickselect's geometric series is the same intuition that makes binary search O(log n).

Further Reading