Radix Sort: Sort a Million Integers Without a Single Comparison

Herman Hollerith needed to sort 62 million census records in 1887. His machine read punch cards one column at a time, routing each card into a bin based on the current digit, then collecting the bins in order and repeating for the next column. That machine became the first recorded use of radix sort. It also eventually became IBM.

The radix sort algorithm is the same today. Sort by the least significant digit first, then the next digit, then the next. No comparisons between elements. No pivots. Just frequency counts, prefix sums, and patience. For fixed-width integers, this runs in linear time, which makes it faster than any comparison-based sort on large inputs. The catch is understanding exactly when that claim holds and when it quietly falls apart on you.

Why Comparison-Based Sorts Hit a Hard Floor

Any algorithm that sorts by comparing pairs of elements must make at least Ω(n log n) comparisons in the worst case. The proof is elegant. Model the algorithm as a binary decision tree: each internal node compares two elements, the left branch is "a < b," the right branch is "a ≥ b." The leaves are the possible output orderings. There are n! possible orderings of n elements, so the tree needs at least n! leaves. A binary tree with n! leaves has height at least log₂(n!), and by Stirling's approximation, log₂(n!) = Θ(n log n). That floor is unavoidable for any comparison-based approach.

Radix sort sidesteps this completely by never comparing elements directly. The lower bound proof only applies to algorithms that learn sort order through comparisons. Radix sort uses digit values, not element order relationships. The proof simply doesn't apply.

For more on Big-O and this lower bound, the Big-O Notation guide covers the decision-tree argument in full. For a head-to-head look at what this means in practice, see Merge Sort vs Quick Sort.

The feeling of escaping the O(n log n) floor vs. forgetting what an interface is

Reducing a function from O(n²) to O(1) Sunday night vs. explaining what an interface does Wednesday afternoon. The algorithm knowledge is still in there somewhere.

Why You Always Start on the Right

Take the array [170, 45, 75, 90, 802, 24, 2, 66]. LSD (Least Significant Digit) radix sort processes digit positions from right to left.

Pass 1 sorts by the ones digit:

Digit	Elements
0	170, 90
2	802, 2
4	24
5	45, 75
6	66

Collect buckets in order: [170, 90, 802, 2, 24, 45, 75, 66]. Not fully sorted. But correctly sorted by ones digit.

Pass 2 sorts by the tens digit (using the result from pass 1):

Digit	Elements
0	802, 2
2	24
4	45
6	66
7	170, 75
9	90

Collect: [802, 2, 24, 45, 66, 170, 75, 90]. Now correctly sorted by the last two digits.

Pass 3 sorts by the hundreds digit:

Digit	Elements
0	2, 24, 45, 66, 75, 90
1	170
8	802

Collect: [2, 24, 45, 66, 75, 90, 170, 802]. Three passes, zero comparisons between elements, fully sorted.

LSD Radix Sort: three passes on [170, 45, 75, 90, 802, 24, 2, 66], showing each element routing into its digit bucket and collecting in order

Each pass is a stable counting sort on one digit position. Three passes, zero comparisons, fully sorted.

LSD vs MSD: Which End Do You Start From?

LSD works right to left with a fixed number of passes, and it's stable by construction. Most people who say "radix sort" mean LSD.

MSD (Most Significant Digit) works left to right via recursion. It can terminate early when a subgroup is already uniform on the remaining digits, handles variable-length strings naturally, and is the basis of American flag sort. The cost is recursion overhead and cache-unfriendly thrashing as subproblems shrink.

For sorting integers, LSD is almost always the right choice. For sorting strings lexicographically, MSD is worth considering. This article focuses on LSD.

Each Pass Is a Stable Counting Sort

Each pass of radix sort is a counting sort restricted to a single digit position. Stability is not optional: use an unstable sort at any pass and the algorithm silently produces wrong output. No exception thrown. No assertion failed. Just wrong answers, returned confidently.

Counting sort for a single digit position works in three steps.

Step 1: Count. Build a frequency array count of size k (the base). For each element, increment count[digit] where digit is the current digit position value.

Step 2: Prefix sum. Transform the count array in place. After this step, count[d] holds the total number of elements with digit value ≤ d. This gives the exclusive upper bound on the output index range for digit d.

Step 3: Backwards fill. Iterate the input array from right to left. For each element with digit d, place it at output index count[d] - 1 and decrement count[d].

The backwards direction is the stability mechanism. The last element in the input with digit d lands at the last available slot for digit d. The second-to-last element lands one position earlier. The relative order of equal-keyed elements is preserved exactly as they appeared in the input. Going forward would reverse that relative order, which would break correctness on the next pass.

Counting sort pass traced step by step: input digits, count array after counting, prefix sum transformation, backwards fill sequence, and final output

The backwards fill is the whole trick. Going forward would reverse the relative order of equal-keyed elements and break the next pass.

Here is the count array trace for pass 1 on [170, 45, 75, 90, 802, 24, 2, 66] (ones digit):

Input digits: [0, 5, 5, 0, 2, 4, 2, 6]

After counting:  count = [2, 0, 2, 0, 1, 2, 1, 0, 0, 0]
After prefix:    count = [2, 2, 4, 4, 5, 7, 8, 8, 8, 8]

Backwards fill (i from 7 to 0):
  i=7: 66  → digit 6 → output[7]=66,  count[6]=7
  i=6: 2   → digit 2 → output[3]=2,   count[2]=3
  i=5: 24  → digit 4 → output[4]=24,  count[4]=4
  i=4: 802 → digit 2 → output[2]=802, count[2]=2
  i=3: 90  → digit 0 → output[1]=90,  count[0]=1
  i=2: 75  → digit 5 → output[6]=75,  count[5]=6
  i=1: 45  → digit 5 → output[5]=45,  count[5]=5
  i=0: 170 → digit 0 → output[0]=170, count[0]=0

Output: [170, 90, 802, 2, 24, 45, 75, 66]

Sorted by ones digit, with original relative order preserved within each digit group.

Why LSD Actually Works: The Correctness Proof

CLRS Lemma 8.3 states that LSD radix sort correctly sorts n d-digit numbers in Θ(d(n + k)) time, provided the stable sort used at each pass takes Θ(n + k) time.

The proof is induction on the digit position being sorted. The loop invariant: after i passes (sorting digit positions 1 through i, ones place through 10^(i-1) place), the array is correctly sorted on its last i digits.

Base case: before any passes, the array is trivially sorted on zero digits.

Inductive step: assume after i − 1 passes the array is correctly sorted on its last i − 1 digits. Now run counting sort on digit position i.

Two elements with different digits at position i: counting sort places them in the correct relative order based on that digit.
Two elements with the same digit at position i: stability preserves their relative order from the previous step. By the inductive hypothesis, that relative order is already correct for the last i − 1 digits.

After d passes, the array is sorted on all d digits. QED.

The proof works because stability converts "the subroutine knows about one digit" into "the subroutine preserves knowledge about all previous digits for free." Work done in earlier passes accumulates through stability rather than being redone.

Stability proof: two elements (45 and 75) with the same hundreds digit enter pass 3 in correct relative order from pass 2. A stable pass preserves that order; an unstable pass scrambles it.

Stable: previous-pass work accumulates for free. Unstable: every pass starts from scratch and gets the wrong answer.

How Fast? How Much Space?

Operation	Time	Space
Single counting-sort pass	O(n + k)	O(n + k)
Full LSD sort (d passes)	O(d(n + k))	O(n + k)
Finding the maximum (to determine d)	O(n)	O(1)

Here n is the element count, k is the base (number of distinct digit values), and d is the number of digit positions in the largest element.

Best, average, and worst case are identical. Radix sort never branches on the values being sorted, only on digit indices. The input arrangement has zero effect on runtime. There are no lucky pivots and no adversarial inputs.

Space is O(n + k): one output buffer of size n (reused across all passes) plus one count array of size k (also reused).

The linear-time claim has a hidden condition

For 32-bit integers sorted in base 10: d = 10, k = 10, total time = O(10 × (n + 10)) = O(n). That looks linear. And it is, because d and k are constants independent of n.

But suppose your inputs can be as large as n^c for some constant c. Then d = Θ(log n), and the total cost becomes O(n log n). Same as merge sort, with worse constants. Radix sort is genuinely linear only when d is a constant independent of n. This holds for fixed-width integer types (int32, int64) and fixed-length strings. It does not hold for arbitrary-precision integers where the key size can grow with n. That last sentence is the fine print. The one nobody reads until it's 3am and the sort is taking four minutes on data that was supposed to be "just integers."

The Base Is a Dial, Not a Fixed Setting

The formula O(d(n + k)) hides a dependency: the base k determines both the number of passes d and the cost of each pass. Larger k means fewer passes but more buckets. Smaller k means more passes but cheaper individual passes.

For W-bit integers, d = ⌈W / log₂(k)⌉. Substituting: total time = O((W / log₂(k)) × (n + k)). Setting k = n minimizes this expression asymptotically (d ≈ W / log₂(n) passes, each costing O(n)), giving O(Wn / log₂(n)) total operations. For fixed W, that is O(n) as n grows.

The catch: k = n means a count array with n entries. For n = 10 million, that is a 40 MB array that does not fit in L2 or L3 cache. Every count array access is a cache miss. You have, in effect, traded your O(n) theoretical speedup for a barrage of DRAM round trips. The CPU is not impressed.

Base 256 is the practical sweet spot. For 32-bit integers, it uses four passes (one per byte). The count array has 256 entries, fits in 1 KB, and sits entirely in L1 cache. Each pass accesses memory sequentially. Empirically, base-256 implementations run 2 to 5 times faster than base-10 on large inputs despite identical asymptotic complexity.

The code examples below use base 10 for readability. In performance-critical code, switch to base 256 and extract bytes with (value >> (8 * pass)) & 0xFF.

Two-axis chart showing passes decreasing and count array size increasing as base grows. Base 256 sits at the sweet spot where both are manageable: 4 passes and a 1 KB count array.

Passes shrink fast up to base 16, then flatten out. Count array size grows linearly. Base 256 is where the curves stop fighting each other.

Two Traps You'll Step Into

Standard LSD Radix Sort Only Handles Non-Negative Integers

If your input contains negative numbers, the algorithm silently produces wrong output. It finishes. It returns. It looks like it worked. The output is quietly, confidently wrong. Negative values in base 10 would use a minus sign, which is not a digit; in binary, the sign bit sits in the most significant position, which LSD processes last but treats as just another bit.

The clean fix for mixed inputs: partition the array into negatives and non-negatives, sort each half independently (for negatives, sort their absolute values and reverse the result), then concatenate the sorted negatives before the sorted non-negatives.

For base-2 radix sort on signed integers, flip the sign bit during the final pass. In two's complement, flipping the sign bit swaps the ordering of positive and negative halves into the correct relative order.

Losing Stability in Your Subroutine

Substituting an unstable sort (quicksort, heapsort, selection sort) for counting sort breaks the algorithm. Each pass scrambles the relative order of equal-keyed elements, destroying the work of all previous passes. The algorithm terminates, returns a result, and gives you exactly the amount of confidence you did not deserve.

A quick sanity test: run your subroutine on [20, 21] with the digit fixed to the ones place. Both have ones digit 0 and 1 respectively, so order is determined. Now try [10, 11] fixing the tens place: both have tens digit 1, so their order must be preserved as-is. If [10, 11] comes back as [11, 10], your subroutine is unstable.

One Structure, Every Language

Each implementation below sorts non-negative integers using LSD radix sort in base 10. The counting_sort_by_digit / countingSortByDigit function handles one pass; the outer radix_sort / radixSort function loops over digit positions.

def counting_sort_by_digit(arr: list[int], exp: int) -> None:
    n = len(arr)
    output = [0] * n
    count = [0] * 10

    for val in arr:
        count[(val // exp) % 10] += 1

    for i in range(1, 10):
        count[i] += count[i - 1]

    for i in range(n - 1, -1, -1):
        digit = (arr[i] // exp) % 10
        count[digit] -= 1
        output[count[digit]] = arr[i]

    for i in range(n):
        arr[i] = output[i]


def radix_sort(arr: list[int]) -> None:
    if not arr:
        return
    max_val = max(arr)
    exp = 1
    while max_val // exp > 0:
        counting_sort_by_digit(arr, exp)
        exp *= 10

What Problems Does Radix Sort Actually Solve?

Integer sorting at scale

For millions of 32-bit integers, base-256 radix sort runs four passes (eight for 64-bit), touches each element a constant number of times, and compares nothing. It is the right tool when n is large and the keys are fixed-width integers.

Stable multi-key sorting

Radix sort is inherently stable, which makes it a natural fit for multi-key sorts. Sort by the least significant key first, then each successive key. When records tie on the current key, stability carries their correct relative order from the previous pass. This is how databases execute ORDER BY a, b, c with a sort-based plan.

String sorting

MSD radix sort sorts strings lexicographically in O(total characters) time. Sedgewick and Wayne's Algorithms textbook builds an entire string processing chapter around MSD and 3-way string quicksort. For sorting large collections of strings with bounded length, MSD radix sort or its hybrids are hard to beat.

Suffix array construction

The DC3/Skew algorithm (Kärkkäinen and Sanders, 2003) and SA-IS algorithm (Nong, Zhang, and Chan, 2009) both use radix sort internally to build suffix arrays in O(n) time. Without linear-time sorting of fixed-length triplets, these O(n) constructions are not possible. If you have used bzip2, BWA, or Bowtie for DNA alignment, you have used radix sort indirectly.

Tries and radix sort share a conceptual ancestor: both decompose keys character-by-character. The Trie Data Structure article explores that decomposition in the context of prefix matching.

Histogram-based problems

Any problem that needs to bucket integers by magnitude and process each bucket separately can use counting sort's first step as the bucketing mechanism. Bucket sort is essentially one pass of counting sort followed by independent sorting of each bucket.

When to Reach for Radix Sort

These signals suggest radix sort is worth considering:

Sorting large collections of fixed-width integers (int32, int64, uint64).
A stable integer sort is required and the key range is bounded.
Sorting strings of fixed or bounded length lexicographically.
Implementing a multi-key sort where keys can be decomposed into digit positions.
The input alphabet is small (DNA: 4 symbols, ASCII: 128 values, bytes: 256 values).
You are building or optimizing a string index structure (suffix array, inverted index).

Example 1: Sorting 10 million UNIX timestamps

You have 10 million 32-bit UNIX timestamps (values in [0, 2^31)). You need them sorted for time-based partitioning.

Base-256 radix sort: 4 passes, count array of 256 entries (1 KB, fits in L1 cache), O(4 × (10M + 256)) ≈ O(40M) operations. Each pass is a sequential scan of the input and a sequential write to the output. The hardware prefetcher handles both.

Quicksort: O(n log n) ≈ O(10M × 23) ≈ O(230M) comparisons, plus cache-unfriendly random-access partitioning. In practice, base-256 radix sort wins by a factor of 2 to 5 on inputs this size.

The signal: fixed-width keys (32-bit), large n (10M), constant digit count (d=4 with base 256).

Example 2: Suffix array construction for DNA alignment

You're building a suffix array for a 1 GB DNA sequence. The alphabet is {A, C, G, T}, so k = 4.

Naive comparison-based sort of all n suffixes takes O(n² log n) in the worst case (each string comparison can scan the entire remaining suffix). Even with LCP tricks, getting below O(n log² n) requires careful work.

The DC3 algorithm achieves O(n) by repeatedly radix-sorting short triplets of characters. At each level, triplets have alphabet size at most k = 4, so counting sort takes O(n + 4) = O(n). The recursion depth is O(log n), but each level is O(n), giving O(n) total.

The signal: tiny alphabet (k=4), massive n, need stable sort to preserve relative ranks for recursive construction.

This pattern also shows up in sorting IP addresses (treat each 32-bit address as four 8-bit fields), sorting fixed-length hashes, and sorting records with composite integer keys.

What You Need to Remember

Radix sort avoids the O(n log n) comparison-sort floor by never comparing elements. It uses digit values instead of order relationships.
LSD (least significant digit first) processes digit positions right to left with a fixed number of passes.
Each pass is a stable counting sort: count occurrences, compute prefix sums, fill output backwards.
Stability is non-negotiable. An unstable subroutine corrupts the output silently.
The correctness proof is induction on digit position. Different digits at position i: counted correctly. Same digit at position i: stability preserves previous-pass order.
Time: O(d(n + k)). Space: O(n + k). All three cases (best, average, worst) are identical.
Linear time holds only when d is a constant. If d = O(log n), total cost is O(n log n).
Base 256 is the practical sweet spot: 4 passes for 32-bit integers, 1 KB count array fits in L1 cache.
Standard implementations only handle non-negative integers. Negatives require extra handling.
Real uses: large-scale integer sorting, stable multi-key sorting, suffix array construction, string sorting, histogram aggregation.

If you want to practice recognizing when radix sort applies and explaining the complexity analysis out loud under time pressure, SpaceComplexity runs voice-based mock interviews with rubric-based feedback on exactly these skills. The difference between knowing the algorithm and being able to explain it in 45 minutes is the part most people skip.