Counting Sort Algorithm: How to Sort an Array Without a Single Comparison

May 26, 202618 min read
dsaalgorithmsinterview-prepdata-structures
Counting Sort Algorithm: How to Sort an Array Without a Single Comparison
TL;DR
  • Counting sort algorithm bypasses the Ω(n log n) lower bound by reading values as array indices instead of comparing elements.
  • O(n + k) time and space where k is the value range; when k = O(n), that's linear sorting in every case, not just average.
  • Three phases: count frequencies (O(n)), compute prefix sums (O(k)), place elements right-to-left (O(n)) — costs add, never multiply.
  • Stability comes from the right-to-left placement pass; the simple unstable variant emits each value count[v] times but loses record identity.
  • Radix sort depends on counting sort as its stable inner loop — without stability each digit pass corrupts the previous one.
  • Three interview signals: explicit bounded integer range in constraints, sort-by-frequency structure, or n ≤ 10^6 with values in [0, n].
  • Practical ceiling around k ≤ 10^6; when k >> n the O(k) initialization dominates and comparison sorts win.

You learn early that you cannot sort faster than O(n log n). Every comparison-based algorithm is stuck there, and the proof is airtight: model any comparison sort as a binary decision tree, count the n! leaves, and you get a height of at least log₂(n!) ≈ n log n. No comparison sort escapes this.

Counting sort didn't get the memo. It sorts n elements in O(n + k) time, where k is the range of your values. When k = O(n), that's O(n). Linear. Faster than merge sort, heap sort, and quicksort. Every one of them.

Instead of asking "which element is smaller?", counting sort asks "how many elements have exactly this value?" and uses array indexing to answer in O(1). No comparison happens at all. The sorted order falls out of a frequency table.

Reach for it when keys are small non-negative integers and k is modest relative to n. Harold Seward described the algorithm at MIT in 1954, the same year he described radix sort, because he needed counting sort as its inner loop. That relationship hasn't changed.

The Rule It Isn't Breaking

The Ω(n log n) lower bound is a theorem about comparison-based sorting. The decision tree proof counts how many comparisons you need to distinguish among all n! possible input orderings. Each comparison answers one binary question, so you need at least log₂(n!) steps, which is Θ(n log n).

Counting sort sidesteps this because it never runs a comparison. It reads each input value as an array index and increments a counter. That's a memory access, not a comparison. The decision-tree model simply doesn't apply. Neither does the lower bound.

The tradeoff is real. You pay O(k) in both time and space to set up the count array. If k grows to 2^32 and your array has 100 elements, you're initializing four billion cells for a hundred values. That's four billion writes for a hundred elements. The algorithm is no longer helping you. When k is proportional to n or small in absolute terms (character values, grades, ages, single-digit IDs), it wins cleanly.

Bins, Not Comparisons

Picture sorting a pile of index cards, each labeled with a number from 1 to 5. You don't need to compare them. You put five labeled bins on the table, drop each card into its bin, then read the bins left to right. Done.

A comparison sort would sit every card down and ask "are you bigger than this one?" Counting sort just sends everyone directly to their assigned slot. No card ever touches another card.

The only subtlety is sorting records with attached data rather than bare integers, and that requires one extra step: the prefix sum.

Three Passes, Each O(n) or O(k)

The algorithm runs three sequential passes. None are nested. The costs add together, they don't multiply, which is why the total is O(n + k) and not O(n * k).

Three sequential phases of counting sort: Count Frequencies, Prefix Sum, Place Right-to-Left. Costs add, they do not multiply.

Three phases, no nesting. O(n) + O(k) + O(n) = O(n + k), not O(n * k).

Phase 1: Build the Frequency Table

Allocate a count array of size k + 1, initialized to zero. Walk the input once, incrementing count[v] for each value v.

Input:   [4, 1, 3, 1, 4, 2]

Index:    0  1  2  3  4
count:   [0, 2, 1, 1, 2]
              ^        ^
          two 1s    two 4s

Cost: O(n) to scan the input, O(k) to initialize the count array. No comparison happens anywhere in this phase.

Input array [4,1,3,1,4,2] with arrows pointing each value into its corresponding count array slot. Values become indices, not operands.

Each value goes directly into its slot. There is no "is 4 bigger than 3?", just count[4]++.

Phase 2: Prefix Sum Transforms Counts Into Positions

This is the step everyone skips the first time. They get a sorted array of integers and a completely scrambled set of records. Then comes the debugger. Don't skip this step.

Transform the count array into a cumulative sum: for each i from 1 to k, set count[i] += count[i - 1]. After this transformation, count[v] no longer means "how many v's are there." It means "how many elements are ≤ v", which is exactly the 1-indexed output position of the last v in the sorted result.

Before prefix sum:   [0, 2, 1, 1, 2]

After prefix sum:    [0, 2, 3, 4, 6]
                          |  |  |  |
count[1] = 2  --> last 1 goes at output index 1  (0-indexed)
count[2] = 3  --> last 2 goes at output index 2
count[3] = 4  --> last 3 goes at output index 3
count[4] = 6  --> last 4 goes at output index 5

Cost: O(k) to scan the count array once.

Before and after the prefix sum transformation. count[v] changes meaning from "how many v's" to "last output slot for v".

Before: counts. After: positions. One O(k) pass turns one into the other.

Phase 3: Place Elements Right to Left

Walk the input from right to left. For each element v, place it at index count[v] - 1 in the output array, then decrement count[v].

Input (right to left):  2, 4, 1, 3, 1, 4
count at start:        [0, 2, 3, 4, 6]
output:                [_, _, _, _, _, _]

v=2  -->  output[count[2]-1] = output[2] = 2,  count[2] becomes 2
v=4  -->  output[count[4]-1] = output[5] = 4,  count[4] becomes 5
v=1  -->  output[count[1]-1] = output[1] = 1,  count[1] becomes 1
v=3  -->  output[count[3]-1] = output[3] = 3,  count[3] becomes 3
v=1  -->  output[count[1]-1] = output[0] = 1,  count[1] becomes 0
v=4  -->  output[count[4]-1] = output[4] = 4,  count[4] becomes 4

output: [1, 1, 2, 3, 4, 4]  ✓

Cost: O(n) to scan the input.

Phase 3 step-by-step: right-to-left scan places each element using count[v]-1, then decrements. Equal-key elements land in original relative order.

Three steps shown. The rightmost element with a given value always gets the rightmost reserved slot. Original order preserved.

Why Right to Left Makes It Stable

There's a simpler way to build the output that skips all of this. After counting, just iterate through the count array and emit each value count[v] times:

# Simple variant, not stable result = [] for val, freq in enumerate(count): result.extend([val] * freq)

This works correctly for sorting bare integers. Fast, clean, only O(k) extra space.

But if your values are keys attached to records (sort employees by age, or sort packets by priority), the simple variant throws away all record identity. You get the right integer sequence but the wrong records.

The right-to-left pass makes counting sort stable: two elements with equal keys appear in the output in the same relative order they had in the input.

After the prefix sum step, count[v] is one past the last reserved slot for v. Scanning right-to-left, the rightmost v in the input arrives first. It gets placed at count[v] - 1 (the last slot for v), then the count decrements. The second-rightmost v arrives next and gets placed one slot earlier. Original order preserved.

Stability is exactly why counting sort is the inner loop of radix sort. Radix sort sorts integers digit by digit, from least significant to most significant. Each digit pass must be stable, or a previous pass's ordering gets corrupted. Counting sort delivers that guarantee in O(n + k) time.

Time and Space: Why O(n + k) Holds

PhaseTimeSpace
Initialize count arrayO(k)O(k)
Count frequencyO(n)O(1) additional
Prefix sumO(k)O(1) in-place
Place right-to-leftO(n)O(n) for output
Total (stable)O(n + k)O(n + k)
Simple (unstable) variantO(n + k)O(k)

The total is O(n + k) because the four phases run sequentially, not nested: their costs add, they don't multiply. There are no loops inside loops. Each element is touched at most twice (once in the count phase, once in the placement phase). Each count-array cell is touched at most three times (initialization, prefix sum, and one decrement).

There is no worst case distinct from the average. Counting sort does not care whether the input is sorted, reversed, or random. The same passes run regardless.

Once k grows past roughly O(n log n), a general comparison sort is cheaper. A common rule of thumb is k ≤ 10^6 as a safe absolute bound, but the real threshold depends on your memory budget and whether the O(k) initialization fits in cache.

Negative Values Need an Offset

The standard algorithm assumes values in [0, k]. For arrays with negative values, compute the minimum element, subtract it from every value before sorting, then add it back afterward. The range becomes max - min + 1.

def counting_sort_general(arr: list[int]) -> list[int]: if not arr: return [] lo, hi = min(arr), max(arr) shifted = [x - lo for x in arr] return [x + lo for x in counting_sort(shifted, hi - lo)]

One Structure, Every Language

Each implementation below sorts integer arrays with values in [0, max_val] using the stable three-phase algorithm.

def counting_sort(arr: list[int], max_val: int) -> list[int]: count = [0] * (max_val + 1) for x in arr: count[x] += 1 for i in range(1, max_val + 1): count[i] += count[i - 1] output = [0] * len(arr) for i in range(len(arr) - 1, -1, -1): output[count[arr[i]] - 1] = arr[i] count[arr[i]] -= 1 return output

What Problems Does It Solve?

Counting sort fits a narrow but recurrent category of problems. You need two things:

  1. Keys are non-negative integers (or can be mapped to them).
  2. The range k is small relative to n, or small in absolute terms.

When both conditions hold and you need guaranteed linear time, counting sort is the answer. Not expected linear, not amortized linear. Worst-case linear every time, on every input.

The concrete shapes:

  • Bounded integer keys: ages (0-120), HTTP status codes (100-599), exam grades (0-100), ASCII characters (0-127), single-digit values. Any time the problem constrains values to a small range, sorting by counting is faster than sorting by comparison.
  • Frequency distribution into sorted order: count how often each value appears, then process in value order. The count phase alone is counting sort's first phase. Build on it.
  • The "values in [1, n]" shape: a classic signal in competitive programming and interview problems. n elements, values all in [1, n], sort in linear time. Counting sort with k = n.
  • Radix sort subroutine: radix sort extends counting sort's reach to large integers by sorting digit by digit. Each digit pass is counting sort with k = the digit base (typically 10 or 256). Counting sort's stability is what makes the composition correct.

What it does not solve: floating-point keys, string keys with large character sets, arbitrary 32-bit integer arrays (k = 4 billion, not practical). For those cases, look at Merge Sort vs Quicksort for the tradeoffs among comparison sorts.

Three Signals to Reach for Counting Sort

Pattern recognition for counting sort is unusually easy because the key signal is in the problem constraints, not hidden in the problem logic. The constraint is the hint. You don't need to figure it out from the examples.

Signal 1: An Explicit Bounded Integer Range

If the problem states "values in [1, 100]" or "each element is a lowercase letter" or "all ages between 0 and 120", compute k. If k is small relative to n or small in absolute terms, counting sort applies directly.

LeetCode 1051: Height Checker

Students should stand in non-decreasing order of heights. Count how many are out of position.

Constraints: 1 <= heights[i] <= 100. That's k = 100, fixed regardless of n. Sort the expected order in O(n + 100) = O(n), then compare element-by-element.

def heightChecker(heights: list[int]) -> int: count = [0] * 101 for h in heights: count[h] += 1 expected = [] for val in range(101): expected.extend([val] * count[val]) return sum(1 for a, b in zip(heights, expected) if a != b)

The constraint pins k to a constant. Sorting 100 categories of students is O(n), not O(n log n). Any comparison sort here is slower for no reason.

Signal 2: Sort by Frequency

Any problem whose core move is tallying discrete values and then ordering by that tally is counting sort in disguise. The count phase you already need for the tally is the count phase of the sort.

LeetCode 451: Sort Characters By Frequency

Given a string, sort it in decreasing order of character frequency.

def frequencySort(s: str) -> str: count = [0] * 128 for c in s: count[ord(c)] += 1 # Sort 128 character indices by frequency, not n characters chars = sorted(range(128), key=lambda i: -count[i]) return ''.join(chr(c) * count[c] for c in chars)

The key insight: you are not sorting n characters. You are sorting 128 integer indices by a precomputed frequency. The expensive sort is over 128 elements, always. The n-cost part is the count pass, which is O(n). Total: O(n + 128 log 128) = O(n).

Signal 3: The Range Mismatch in the Constraints

If n can be 10^5 or 10^6 and values are in [0, n], an O(n log n) sort takes roughly 17n to 20n operations. An O(n + n) sort takes 2n. That's roughly a 10x speedup, and problems with tight time limits at these sizes often require it.

When you see n ≤ 10^6 with values in [0, 10^6], the intended solution is almost certainly counting sort, radix sort, or bucket sort. A comparison sort will often TLE. The constraint mismatch is the hint, and it's sitting right there in the problem statement.

Counting Sort Algorithm: Recap

  • Counting sort sorts n elements in O(n + k) time and O(n + k) space, where k is the value range.
  • It bypasses the Ω(n log n) lower bound because it never compares elements. It reads values as array indices.
  • Three phases: count frequencies (O(n + k)), compute prefix sums (O(k)), place right-to-left (O(n)).
  • The right-to-left placement is what makes it stable. Skip it and you get O(k) space but lose stability and record identity.
  • The simple (unstable) variant just emits each value count[v] times. Correct for bare integers, wrong for records.
  • Only practical when k is small. If k >> n, the O(k) initialization dominates and comparison sorts win.
  • It's the subroutine that makes radix sort correct, because radix sort requires stability at each digit pass.
  • Negative values work fine with a min-offset: shift by min, sort, shift back.
  • Signals: explicit bounded integer range in constraints, sort-by-frequency structure, n ≤ 10^6 with values in [0, n].

Recognizing algorithm signals under pressure is what interviews actually test. SpaceComplexity runs voice-based mock interviews with structured feedback on exactly that. Worth a look.

For the full complexity analysis behind why comparison sorts cannot beat O(n log n), Big-O Notation walks through the decision-tree argument in detail. For a head-to-head on merge sort versus quicksort, including the real-world constants and cache behavior, see Merge Sort vs Quicksort.

Further Reading