Quickselect Algorithm: Find Any Order Statistic in O(n) Without Sorting

- Quickselect finds the kth order statistic in O(n) expected time by partitioning and recursing into one subarray, not two
- Random pivot selection drops worst-case O(n²) probability to effectively zero; fixed pivots on sorted input are the classic failure mode
- Three-way partition (Dutch National Flag) is required for arrays with many duplicates, otherwise all-equal inputs degrade to O(n²)
- Median of medians (BFPRT, 1973) guarantees O(n) worst case but carries ~6× larger constants; introselect uses it only as a fallback
- Iterative quickselect uses O(1) space because the algorithm is always a tail call — just update lo or hi in a loop
std::nth_elementandnumpy.partitionboth ship introselect; prefer them over a hand-rolled implementation in production
You have a million numbers and you want the median. The obvious move: sort, then index the middle. That works. But you just did O(n log n) work when the answer needed O(n). The sorted positions of every other element were discarded the moment you read arr[n/2].
The quickselect algorithm is the fix. Same partition trick as Quicksort, but instead of recursing into both halves, you throw away the half that cannot contain your target. One pass narrows the search. The next pass narrows it again. The work at each level forms a geometric series that sums to O(n).
Tony Hoare published it in July 1961 as "Algorithm 65: Find" in Communications of the ACM, the same issue as Quicksort (Algorithm 64) and Partition (Algorithm 63). It has been in your standard library ever since. std::nth_element in C++ and numpy.partition in Python both run it.
Reach for Quickselect when you need the kth order statistic and sorting is overkill. Kth smallest, top-k, approximate median, percentiles, streaming statistics: if you can afford to rearrange the array, this is your O(n) path.

You sorted. You only needed one rank. The frog approves. Quickselect would not have.
What Quickselect Does to Your Array
Quickselect is in-place. It rearranges the input and returns the element at position k (0-indexed).
When it finishes, arr[k] holds the element that would appear at index k in the sorted version. Everything in arr[0..k-1] is ≤ arr[k]. Everything in arr[k+1..n-1] is ≥ arr[k]. The order within those two regions is unspecified.
Input: [7, 2, 4, 1, 6, 3, 5] k = 2 (3rd smallest, 0-indexed)
After: [1, 2, 3, 4, 7, 6, 5]
^ k=2
[1, 2] ≤ 3 ≤ [4, 7, 6, 5] ← only this partition property is guaranteed
The partial-order guarantee is exactly what makes it fast. You never commit to sorting either half.
One Partition Step Settles the Pivot's Rank Forever
The core operation: pick a pivot, rearrange the subarray so all elements smaller than the pivot are on its left and all elements larger are on its right. The pivot lands at its exact rank in the sorted order.
After partition, the pivot's final index is its exact rank. This is not an approximation. If the pivot lands at index p, then arr[p] is provably the (p+1)th smallest element in the subarray.
Array: [3, 6, 1, 5, 2, 4]
↑ choose 4 as pivot
After partition:
[ 3, 1, 2 | 4 | 6, 5 ]
------- -----
all ≤ 4 all ≥ 4
pivot is permanently at index 3
Quicksort recurses into BOTH sides. Quickselect compares the pivot's index p to k:
p == k: done.arr[p]is the answer.p < k: the answer is in the right subarray. Discard everything at or left of p.p > k: the answer is in the left subarray. Discard everything at or right of p.
The discarded half is gone. No information about the answer is lost because the pivot's rank is exact.

After one partition, the pivot is home. It never moves again.
[3, 6, 1, 5, 2, 4] k=2
|
partition, pivot=4 at p=3
|
k < p, so recurse left only
↓
[3, 1, 2] k=2
|
partition, pivot=2 at p=1
|
k > p, so recurse right only
↓
[3] k=2 (single element)
|
return 3 ✓
Three Passes in Action
A concrete trace. We want the 3rd smallest (k=2) from [3, 6, 1, 5, 2, 4]. Sorted order is [1, 2, 3, 4, 5, 6] so the answer is 3.
Pass 1. Random pivot = 4 (value), currently at index 5. Lomuto partition with lo=0, hi=5:
Initial: [3, 6, 1, 5, 2, 4] pivot = arr[5] = 4
Scan left-to-right, swap elements ≤ 4 to the front:
3 ≤ 4 → move → [3, ...] i=1
6 > 4 → skip
1 ≤ 4 → move → [3, 1, ...] i=2
5 > 4 → skip
2 ≤ 4 → move → [3, 1, 2, ...] i=3
Final: swap arr[3] ↔ arr[5]
Result: [3, 1, 2, 4, 6, 5] pivot at p=3
p=3 > k=2, so recurse left into arr[0..2] = [3, 1, 2].
Pass 2. Random pivot = 2, at index 2 within [3, 1, 2]:
Partition [3, 1, 2] with pivot=2 (arr[hi]):
3 > 2 → skip
1 ≤ 2 → move → [1, 3, ...] i=1
Final: swap arr[1] ↔ arr[2]
Result: [1, 2, 3] pivot at p=1 (global)
p=1 < k=2, so lo = 2. Now lo == hi == 2. Loop exits.
Return arr[2] = 3. Correct.
Lomuto or Hoare? Two Partitions, Same Goal
Two standard implementations. Both correct. Different tradeoffs.
Lomuto (clearer, slightly more swaps):
def partition(arr, lo, hi): # Random pivot: swap a random element to arr[hi] first pivot_idx = random.randint(lo, hi) arr[pivot_idx], arr[hi] = arr[hi], arr[pivot_idx] pivot = arr[hi] i = lo for j in range(lo, hi): if arr[j] <= pivot: arr[i], arr[j] = arr[j], arr[i] i += 1 arr[i], arr[hi] = arr[hi], arr[i] return i # ← pivot is now exactly at index i
After the call, the pivot IS at index i. Use it directly as the pivot's rank in quickselect.
Hoare (roughly 3× fewer swaps, trickier correctness):
def hoare_partition(arr, lo, hi): pivot = arr[lo] i, j = lo - 1, hi + 1 while True: i += 1 while arr[i] < pivot: i += 1 j -= 1 while arr[j] > pivot: j -= 1 if i >= j: return j # ← split point, NOT the pivot's index arr[i], arr[j] = arr[j], arr[i]
Hoare returns a split point j, not the pivot's index. arr[lo..j] ≤ pivot and arr[j+1..hi] ≥ pivot, but the pivot could be anywhere in the left half. With Hoare, you recurse on [lo, j] and [j+1, hi], never on [lo, j-1]. Using j-1 is the most common off-by-one bug in quickselect implementations. It looks right. It is wrong. It will haunt you.
For the implementations below, Lomuto is used: the pivot's exact rank is immediately available, the "found it" early-exit (p == k) is unambiguous, and random pivot selection works cleanly.
One rule regardless of partition scheme: always choose the pivot randomly by swapping a random index to the end before partitioning. Without this, a sorted array with a fixed last-element pivot hits O(n²) every time.
Why O(n)? The Geometric Series Argument
Call a pivot "good" if it lands in the middle 50% of values in the current subarray, meaning its rank falls between n/4 and 3n/4. A random pivot is good with probability ≥ 1/2. The expected number of pivots needed before getting a good one is 2 (geometric distribution with p=1/2).
When you get a good pivot, the next subproblem is at most 3n/4. So the expected work per level is:
Level 0: O(n) ← full partition scan
Level 1: O(3n/4) ← good pivot guaranteed at most 3n/4
Level 2: O(9n/16)
Level 3: O(27n/64)
...
Sum: O(n) × (1 + 3/4 + 9/16 + ...) = O(n) × 1/(1 − 3/4) = O(4n) = O(n).
The geometric series converges because each level shrinks the problem by at least a constant factor. Quicksort recurses into BOTH halves, so the total work per level stays O(n) and you get O(n log n). Quickselect recurses into ONE half, so the levels converge.

Quicksort is a full binary tree touching O(n) at every level. Quickselect is a single path. That's the whole difference.

Each bar is one recursion level. They shrink by 3/4 every time. The series converges to 4n.
The CLRS formal analysis (RANDOMIZED-SELECT, Chapter 9.2) confirms this: the expected number of comparisons is at most 4n. The exact formula for the median case (k=n/2) is approximately 3.3863n for large n.
When the Pivot Betrays You: O(n²) Worst Case
The worst case: the pivot is always the minimum or maximum of the current subarray.
T(n) = T(n-1) + O(n)
= O(n) + O(n-1) + O(n-2) + ... + O(1)
= O(n²)
This happens with:
- Fixed first-element pivot on an already-sorted array
- Fixed last-element pivot (Lomuto) on a reverse-sorted array
- Median-of-3 pivot on Musser's adversarial "killer" sequences (designed specifically to defeat median-of-3 selection)
With a random pivot, the probability of O(n²) behavior is O(1/n!), effectively zero in practice. An adversary would need to know your random seed to construct a bad input. And if your adversary knows your random seed, you have larger problems than sorting.
All-equal arrays are a separate failure mode covered below.
Duplicates Will Destroy the Basic Version
An array of all-equal elements hits O(n²) with standard two-way partition. Every element compares equal to the pivot. Lomuto puts all elements to the left (since arr[j] <= pivot is always true), so the pivot lands at position n-1. Recurrence: T(n) = T(n-1) + O(n) = O(n²).
The fix is three-way partition, Dijkstra's Dutch National Flag algorithm. It maintains three regions: [less | equal | unexamined | greater].
THREE-WAY-PARTITION(arr, lo, hi):
pivot = arr[lo] # or random
lt, i, gt = lo, lo, hi
while i <= gt:
if arr[i] < pivot: swap arr[lt], arr[i]; lt++; i++
elif arr[i] > pivot: swap arr[i], arr[gt]; gt--
else: i++
# arr[lo..lt-1] < pivot
# arr[lt..gt] == pivot ← ALL equal elements placed in one pass
# arr[gt+1..hi] > pivot
Before: [3, 1, 2, 3, 4, 3, 3] pivot=3, k=4
lt=i=0, gt=6
...after one pass...
After: [1, 2, 3, 3, 3, 3, 4]
^ ^
lt=2 gt=5
If k falls within [lt, gt], return immediately. Otherwise recurse into the appropriate smaller-than or greater-than region. For an all-equal array, one pass handles everything: O(n).

One pass. Three regions. All duplicates handled. The standard two-region partition never stood a chance against [3,3,3,3,3].
The Guaranteed O(n) Fallback: Median of Medians
Randomized quickselect is O(n) expected, not worst-case. The BFPRT algorithm (Blum, Floyd, Pratt, Rivest, Tarjan, 1973) gives a hard O(n) guarantee by choosing a pivot that always splits at least 30-70.
The trick: find a pivot guaranteed to rank between the 30th and 70th percentile, no matter the input.
Step 1: divide into groups of 5
[3 1 4 1 5] → sorted → median = 3
[9 2 6 5 3] → sorted → median = 5
[5 8 9 7 9] → sorted → median = 8
[3 2 3 8 4] → sorted → median = 3
[6 2 6 4 3] → sorted → median = 4
Step 2: collect medians → [3, 5, 8, 3, 4]
Step 3: recursively find median of medians → 4
Step 4: use 4 as pivot
→ at least 30% of elements ≤ 4
→ at least 30% of elements ≥ 4
→ partition is at most 70-30
Why groups of 5 specifically? The branching coefficients are 1/5 (recursive call on medians) + 7/10 (worst-case recursive call on partition) = 9/10 < 1. That's what makes the recurrence converge:
T(n) ≤ T(n/5) + T(7n/10 + 6) + O(n) = O(n)
Groups of 3 fail: coefficients become 1/3 + 2/3 = 1, giving O(n log n). Groups of 5 is the minimum size where the geometry works and sorting each group is still O(1).

Split into groups of 5. Sort each group. Take the medians. Find the median of those medians. Use it as your pivot. Congratulations, you now have guaranteed O(n) worst case and a constant factor six times worse than the randomized version.
The catch: BFPRT uses roughly 22n comparisons in the worst case, versus ~3.4n expected for randomized quickselect. That ~6× overhead makes it impractical standalone. In practice, the standard library uses introselect: start with randomized quickselect, fall back to a guaranteed-linear algorithm if the recursion depth exceeds O(log n). That's what std::nth_element and numpy.partition implement.
How Each Operation Gets Its Complexity
| Operation | Time (avg) | Time (worst) | Space (avg) | Space (worst) |
|---|---|---|---|---|
| kth order statistic (recursive) | O(n) | O(n²) | O(log n) | O(n) |
| kth order statistic (iterative) | O(n) | O(n²) | O(1) | O(1) |
| kth order statistic (BFPRT) | O(n) | O(n) | O(log n) | O(log n) |
| kth order statistic (introselect) | O(n) | O(n) or O(n log n)* | O(log n) | O(log n) |
*libstdc++ std::nth_element falls back to heapselect (O(n log n) worst case), not pure median-of-medians. numpy claims O(n) worst case. The C++ standard requires only "average linear," not worst-case linear.
Space is O(log n) on average because the recursive call stack depth equals the number of partition rounds. With balanced pivots, that's O(log n) rounds. The iterative version collapses this to O(1) because quickselect is always a tail call: there's only ever one recursive call and it's the last thing done. Just update lo or hi in a loop.
The partition itself uses O(1) extra space. No auxiliary arrays.
One Structure, Every Language
import random def quickselect(nums: list[int], k: int) -> int: """Returns the kth smallest element (0-indexed). Mutates a copy.""" arr = nums[:] lo, hi = 0, len(arr) - 1 while lo < hi: p = _partition(arr, lo, hi) if p == k: return arr[k] elif p < k: lo = p + 1 else: hi = p - 1 return arr[lo] def _partition(arr: list[int], lo: int, hi: int) -> int: pivot_idx = random.randint(lo, hi) arr[pivot_idx], arr[hi] = arr[hi], arr[pivot_idx] pivot = arr[hi] i = lo for j in range(lo, hi): if arr[j] <= pivot: arr[i], arr[j] = arr[j], arr[i] i += 1 arr[i], arr[hi] = arr[hi], arr[i] return i # Find the 3rd smallest (k=2) print(quickselect([7, 2, 4, 1, 6, 3, 5], 2)) # 3 # numpy ships introselect if you need it: # import numpy as np # a = np.array([7, 2, 4, 1, 6, 3, 5]) # np.partition(a, 2)[2] # 3
What Classes of Problems Call for This
Quickselect is the right tool when:
You need a specific rank from an unsorted array. Any percentile: minimum (k=0), maximum (k=n-1), median (k=n/2), 90th percentile (k=0.9n). One rank, one pass.
You need the top-k or bottom-k elements, not sorted. Run quickselect with rank k. After the call, arr[0..k-1] contains the k smallest elements in any order. No sorting needed.
Sorting feels like overkill. The brute force is "sort, then index," and you can see the O(n log n) work being thrown away. Full array, one rank, no output order required: that's the quickselect shape.
Quickselect is the wrong tool when:
- Multiple queries on the same array: sort once, then answer in O(1) each.
- Streaming queries after each insertion: use a max-heap and min-heap pair for dynamic median tracking in O(log n) per insertion.
- The input cannot be mutated: you need an O(n) copy, which is fine asymptotically but worth noting.
- The array is already sorted: indexing directly is O(1).
Signals That Quickselect Is the Right Tool
Signal 1: The problem asks for a rank and sorting "should work" but feels wasteful.
The problem says "kth largest," "find the median," or "return the top k elements." The input is given all at once. You notice that a sort-then-index approach gives O(n log n), and the problem constraints hint that O(n) is expected. That's the cue.
Signal 2: The brute force is sort() then [k], and you can see that sorted order is thrown away.
If you immediately reach for sort and only read one index, the sorted positions of all other elements are waste. Quickselect eliminates that waste.
Worked Example 1: Kth Largest Element (LeetCode 215)
Problem: given an unsorted array, return the kth largest element. Constraints imply O(n) or O(n log n). No specific sorted output required.
"Kth largest" is "kth smallest from the right." In 0-indexed terms, that's the (n-k)th smallest.
def findKthLargest(nums: list[int], k: int) -> int: return quickselect(nums[:], len(nums) - k)
Expected O(n). The heap approach (heapq.nlargest(k, nums)) runs in O(n log k), which is O(n log n) when k is proportional to n. Quickselect wins for large k.
No sorted output, single rank, full array available at once. Textbook fit.
Worked Example 2: Find the Median of an Unsorted Array
Problem: given n numbers, find the median in O(n) time.
def find_median(nums: list[int]) -> float: n = len(nums) if n % 2 == 1: return float(quickselect(nums[:], n // 2)) else: arr = nums[:] lo = quickselect(arr, n // 2 - 1) hi = quickselect(arr, n // 2) # arr already partially sorted from first call return (lo + hi) / 2.0
For even-length arrays, two calls are needed. Each is O(n), so the total is still O(n). The second call on an already partially sorted array tends to be faster in practice, since the first call left arr[0..n/2-1] ≤ median.
Sorting would cost O(n log n). A min-heap scan costs O(n log n). Quickselect is the one path to O(n).
"Find something about rank in an unsorted array without needing sorted output" almost always lands here.
Recap
- Quickselect finds the kth order statistic in O(n) expected time by partitioning and recursing into ONE subarray instead of two.
- After partition, the pivot is at its exact sorted rank. Discarding the other half loses zero information.
- Average O(n) follows from a geometric series: each level shrinks by at least a constant factor, and the levels sum like 1 + 3/4 + 9/16 + ... = 4.
- Worst-case O(n²) happens when the pivot is always the min or max. A random pivot makes this probability O(1/n!).
- The iterative version uses O(1) space: quickselect is always a tail call, so just update lo or hi in a loop.
- Three-way partition (Dutch National Flag) is required for arrays with many duplicate elements, otherwise all-equal arrays hit O(n²).
- Median of medians (BFPRT, 1973) guarantees O(n) worst case but with ~6× larger constants. Used as a fallback in introselect.
std::nth_element(C++) andnumpy.partitionboth ship introselect. Use them in production.- Quickselect mutates the array. Copy first if you need the original preserved.
- Hoare's scheme does ~3× fewer swaps than Lomuto but requires careful recursion bounds:
[lo, j]and[j+1, hi], NOT[lo, j-1].
If you want to practice explaining the geometric series argument under pressure and get real-time feedback on whether your reasoning landed, SpaceComplexity runs voice-based mock interviews with rubric-based scoring on algorithm depth. Worth running through before the actual interview.
The same partition step that makes quickselect fast also makes quicksort fast in different ways. The merge sort vs quicksort comparison walks through why recursing into both halves costs O(n log n) while one half costs O(n). For the alternative approach to top-k problems, the heap data structure explains the O(n log k) path and when you'd prefer it over O(n) quickselect. The "eliminate half the search space" reasoning behind quickselect's geometric series is the same intuition that makes binary search O(log n).
Further Reading
- Quickselect on Wikipedia, history, pseudocode, and the full CLRS-style average-case proof
- Median of Medians on Wikipedia, the BFPRT algorithm with the full recurrence derivation and groups-of-5 geometry
- std::nth_element on cppreference, the C++ standard library interface, complexity guarantees, and introselect behavior
- numpy.partition documentation, introselect interface and the
kthparameter semantics - Quickselect Algorithm on GeeksforGeeks, worked examples with Lomuto and Hoare variants