What Is Radix Sort? Sorting Integers Without a Single Comparison

June 5, 20268 min read
dsaalgorithmsinterview-prepdata-structures
What Is Radix Sort? Sorting Integers Without a Single Comparison
TL;DR
  • Radix sort bypasses the O(n log n) comparison-sort floor by grouping elements by digit position instead of comparing them directly.
  • LSD-first ordering is required for correctness: later passes carry higher priority, and earlier passes resolve tiebreakers that survive intact.
  • Stability is non-negotiable: each per-digit pass must use a stable subroutine (typically counting sort) or the output is garbage.
  • Radix sort time complexity is O(d × (n + k)): with bounded integers d is constant, making the sort effectively O(n).
  • Space complexity is O(n + k): the counting sort pass needs a size-n output array and a size-k count array.
  • Best applied to bounded integers or fixed-length strings; skip it for floats, variable-length strings, or custom comparators.
  • Interview signal: explaining why comparison sort has a floor and how radix sort sidesteps it distinguishes a Strong Hire answer.

Every sorting algorithm you've ever studied works by comparing pairs of elements. Merge sort splits and merges by comparing. Quicksort pivots around comparisons. Insertion sort is basically just comparisons with extra steps. Theoreticians proved there's a floor: no comparison-based sort can beat O(n log n) in the worst case. The math is airtight. It's law.

Radix sort looked at that law and kept scrolling.

It never compares two elements. It reads digits. And somehow, that's enough.

No Comparisons, Just Digits

Radix sort groups numbers by individual digit positions. One pass per digit position, from least significant to most significant, using a stable sort on each pass. After the final pass, the array is sorted.

You don't need to compare 802 and 24 to sort them. You just need to know their digits.

"Radix" means base, as in base-10, base-2, or base-256. Sorting in base 10 means processing the ones place, then the tens place, then the hundreds. The number of passes equals the digit count of the largest value.

Simple idea. The tricky part is understanding why it works, and why the order of passes matters. (Spoiler: you're probably about to guess wrong.)

Walk Through It Once

Start with: [170, 45, 75, 90, 802, 24, 2, 66]

The largest value is 802, so three passes.

Pass 1: sort by the ones digit

Extract the ones digit from each number and group into buckets:

Digit 0: [170, 90]
Digit 2: [802, 2]
Digit 4: [24]
Digit 5: [45, 75]
Digit 6: [66]

Collect in bucket order, preserving original order within each bucket:

[170, 90, 802, 2, 24, 45, 75, 66]

Pass 2: sort by the tens digit

Digit 0: [802, 2]   (802 has tens digit 0; so does 2, treated as 002)
Digit 2: [24]
Digit 4: [45]
Digit 6: [66]
Digit 7: [170, 75]
Digit 9: [90]

After pass 2:

[802, 2, 24, 45, 66, 170, 75, 90]

Pass 3: sort by the hundreds digit

Digit 0: [2, 24, 45, 66, 75, 90]   (all values under 100 get digit 0)
Digit 1: [170]
Digit 8: [802]

After pass 3:

[2, 24, 45, 66, 75, 90, 170, 802]

Sorted. Three passes, zero comparisons.

Why Least-Significant Digit First?

LSD-first is not arbitrary. It's what makes the algorithm correct. And yes, every time you first encounter this, your instinct is to go the other direction. That instinct is wrong.

Imagine going MSD-first: hundreds digit first. Your first pass puts 2, 24, 45, 66, 75, 90 before 170 before 802. Good start. Then your tens-digit pass fires and scatters 802 and 2 into the same bucket because they both have a tens digit of 0. Everything the first pass built? Gone.

Going LSD means each later pass has higher priority than the one before. The hundreds-digit pass is last, so it gets final say on the primary sort order. For numbers that share a hundreds digit, the tiebreaker was already settled by the earlier passes: 24 comes before 45 because the ones and tens passes put them there, and that ordering survives intact when the hundreds pass doesn't touch them.

That only holds if each individual pass is stable. When 802 and 2 both land in the "digit 0" bucket on the tens pass, they must stay in the order they arrived from the ones pass. If the inner sort shuffles them, the whole thing produces garbage.

This is why radix sort requires a stable subroutine. Counting sort is the standard choice because it runs in O(n + k) and is naturally stable when implemented with prefix sums. More on the stability property in stable sort.

The Algorithm in Code

def radix_sort(nums: list[int]) -> list[int]: if not nums: return nums max_val = max(nums) exp = 1 while max_val // exp > 0: nums = _counting_sort_by_digit(nums, exp) exp *= 10 return nums def _counting_sort_by_digit(nums: list[int], exp: int) -> list[int]: n = len(nums) output = [0] * n count = [0] * 10 for num in nums: digit = (num // exp) % 10 count[digit] += 1 for i in range(1, 10): count[i] += count[i - 1] for i in range(n - 1, -1, -1): digit = (nums[i] // exp) % 10 count[digit] -= 1 output[count[digit]] = nums[i] return output

The backwards iteration in the last loop is what preserves stability: elements seen later in the input are placed later within their bucket, so earlier elements stay earlier. The outer loop doubles exp each time to shift to the next digit position.

Radix Sort Time Complexity

Time: O(d * (n + k))

Each pass through one digit position costs O(n + k), where n is the number of elements and k is the base (10 for decimal). With d digit positions, total cost is O(d * (n + k)).

For integers bounded by some maximum value W, d = log₁₀(W). Sorting 32-bit integers means at most 10 decimal digits. Either way, d is a constant when W is fixed, and complexity collapses to O(n).

For bounded integers, radix sort is effectively linear.

Space: O(n + k)

The counting sort pass needs an output array of size n and a count array of size k. With base 10, k = 10, so space is O(n).

How It Breaks the O(n log n) Barrier

The comparison-sort lower bound is a decision-tree argument. Any comparison sort produces a tree of yes/no branching decisions. To sort n distinct elements, the tree needs at least n! leaves. A binary tree with n! leaves needs at least log₂(n!) branches deep, which by Stirling's approximation is Θ(n log n).

Radix sort never branches on a comparison. It extracts a digit, increments a counter, places an element in a predetermined slot. The decision tree proof simply doesn't apply to it.

The cost gets paid in a different currency. Radix sort requires values with structure you can extract. It works on integers and fixed-length strings. It doesn't work on arbitrary objects where comparison is your only operation. Quicksort can sort dates, timestamps, custom records. Radix sort needs a digit.

The O(n log n) floor is a floor for a specific game. Radix sort plays a different game.

When to Use It (and When to Walk Away)

Use it when:

  • You're sorting a large array of bounded integers or fixed-length strings
  • Key length d is small relative to n, making O(d * n) better than O(n log n)
  • You need stable output without the overhead of merge sort

Skip it when:

  • You have floats (requires bit manipulation to treat them as integers, possible but messy)
  • You have variable-length strings (padding inflates d)
  • You have custom objects with only a comparator, not digit extraction
  • n is small, where O(n log n) sorts win in practice because of cache efficiency

For the specific question of counting sort versus radix sort, counting sort vs radix sort breaks down the tradeoffs directly.

What It Signals in an Interview

Interviewer: "Sort an array of 0s, 1s and 2s." "Me: First I will sort using Bubble Sort and then..." Interviewer: "My grandma runs faster than your code"

Every developer who knows only one sorting algorithm, in interview form.

Radix sort rarely appears as the named subject of a LeetCode problem. What it does is signal depth when you invoke it correctly.

If an interviewer asks you to sort a million integers between 0 and 1000 as efficiently as possible, O(n log n) is passable. O(n) with a correct explanation of why comparison sort has a floor and how radix sort sidesteps it is a completely different category of answer. That kind of response ends up in the Strong Hire section of a feedback write-up.

It also shows up in disguise. Problems involving frequency sorting, bucket-based grouping, or anything where you know the value range often hint toward linear-time sorting. Recognizing the structural precondition (bounded values, extractable keys) is the actual skill being tested, not the algorithm name.

It connects to a cluster of concepts interviewers use to check whether you think in models or just memorized code: counting sort, stable sorting, the O(n log n) floor. Knowing how they fit together is the signal. A candidate who can explain why stability is non-negotiable in radix sort has demonstrated something that correct-but-silent problem solving never will.

If you want to practice explaining this verbally, under time pressure, to someone probing your reasoning, that's the drill SpaceComplexity runs. Voice-based mock interviews with rubric-based feedback catch the gaps that grinding LeetCode silently won't.

For a deeper look at what the O(n log n) floor actually means, what is big-O notation covers the comparison-sort proof in more detail.

Further Reading