Bucket Sort Algorithm: O(n) Sorting When Data Is Uniform

Every sorting algorithm you've probably studied uses comparisons. Quicksort, merge sort, heapsort: all of them reduce to "is A greater than B?" And because of that, they're mathematically stuck. There's a theorem that says you need at least log₂(n!) comparisons to sort n arbitrary elements, and log₂(n!) is Θ(n log n) by Stirling's approximation. This isn't a hardware problem. It's an information-theory problem. You're not going to win with a faster CPU.

Bucket sort sidesteps the theorem entirely. Not by being cleverer about comparing, but by not comparing at all. You look at where a value falls in a known range and drop it directly into the right bin. Then you sort each bin. When the data is spread evenly, each bin holds roughly one element and the whole thing runs in O(n). The theorem still applies. Bucket sort just doesn't care, because it's playing a different game.

The Core Idea

Picture sorting a set of test scores from 0 to 100. You could dump them all in a pile and sort the whole mess, or you could grab ten labeled boxes: 0-9, 10-19, 20-29, and so on. Drop each score in the right box. Sort each box separately. Read them out left to right.

That's bucket sort. The sorting work spreads across many small buckets instead of one giant array. When the input is evenly distributed, those small buckets are nearly empty, and nearly-empty buckets need almost nothing to sort.

The canonical version sorts floating-point numbers in [0, 1). The procedure:

Create n empty buckets, where n is the number of elements.
For each element x, place it in bucket floor(x × n).
Sort each non-empty bucket with insertion sort (or any comparison sort).
Concatenate the buckets from index 0 to n-1.

For input drawn from a uniform distribution over [0, 1), the expected time is O(n). The catch is coming. It's always coming.

Walking Through an Example

Input: [0.78, 0.17, 0.39, 0.26, 0.72, 0.94, 0.21, 0.12, 0.23, 0.68]

Ten elements, so ten buckets. Bucket index = floor(x × 10):

Bucket sort distribution: 10 values spread across 10 buckets, showing only a few buckets receiving elements while most stay empty

Bucket	Range	Elements
0	[0.0, 0.1)	(empty)
1	[0.1, 0.2)	0.17, 0.12
2	[0.2, 0.3)	0.26, 0.21, 0.23
3	[0.3, 0.4)	0.39
4	[0.4, 0.5)	(empty)
5	[0.5, 0.6)	(empty)
6	[0.6, 0.7)	0.68
7	[0.7, 0.8)	0.78, 0.72
8	[0.8, 0.9)	(empty)
9	[0.9, 1.0)	0.94

Sort bucket 1: [0.12, 0.17]. Sort bucket 2: [0.21, 0.23, 0.26]. Sort bucket 7: [0.72, 0.78]. Every other bucket had zero or one element, so sorting was free.

Concatenate: [0.12, 0.17, 0.21, 0.23, 0.26, 0.39, 0.68, 0.72, 0.78, 0.94].

Total comparisons: about six. A comparison sort on ten elements needs at least 22 in the worst case. The difference scales dramatically as n grows.

O(n) Is Conditional (Read the Fine Print)

The O(n) average case rests entirely on the uniform distribution assumption. This is not a footnote. It's the whole deal.

With n elements and n buckets, each bucket is expected to hold exactly one element. Let k be the number of elements landing in a given bucket. Insertion sort on that bucket costs O(k²). The expected total cost across all buckets:

E[total] = n × E[k²]

Using indicator random variables (the full proof is CLRS section 8.4), E[k²] = 2 − 1/n under a uniform distribution. So the expected total sort cost is O(n). Add O(n) for distribution and concatenation, and the whole thing runs in O(n) expected time.

The catch: that "uniform distribution" isn't an implementation detail. It's a precondition. If your data clusters, the analysis falls apart completely.

Case	Time	When
Best	O(n + k)	Elements spread perfectly across k buckets
Average	O(n)	Uniform distribution, k = n
Worst	O(n²)	All elements land in one bucket

The worst case is O(n²) because all n elements pile into one bucket and insertion sort on n elements is O(n²). You created n empty buckets, paid the overhead, and ended up with a worse algorithm than quicksort. That's the tax for using the wrong tool.

Space Complexity

Bucket sort needs O(n + k) auxiliary space. That's n total elements distributed across k buckets, plus the bucket array itself. With k = n, that's O(n). You're explicitly trading memory for the ability to sort in linear time when the conditions are right. No free lunch.

Bucket Sort for General Ranges

The float-in-[0,1) version is clean but narrow. For data in an arbitrary range [min, max], you map each element to a bucket index based on where it falls in that range:

def bucket_sort(arr, num_buckets=10):
    if not arr:
        return arr
    min_val, max_val = min(arr), max(arr)
    if min_val == max_val:
        return arr
    bucket_size = (max_val - min_val + 1) / num_buckets
    buckets = [[] for _ in range(num_buckets)]
    for num in arr:
        idx = min(int((num - min_val) / bucket_size), num_buckets - 1)
        buckets[idx].append(num)
    result = []
    for bucket in buckets:
        result.extend(sorted(bucket))
    return result

The min(idx, num_buckets - 1) guard is required. When num == max_val, the raw index equals num_buckets, which is out of bounds. The clamp prevents the crash. Miss that detail and you'll spend 20 minutes debugging an index error before realizing max_val is a valid input.

Bucket Sort vs Counting Sort vs Radix Sort

All three are non-comparison sorts that beat O(n log n) by exploiting something the general case can't assume.

Counting sort exploits a small integer value range. One counter per possible value, no sorting inside buckets at all. O(n + k) time where k is the range. Works beautifully when k is small. Completely useless when k is large relative to n (imagine counting sort on floating-point timestamps).

Radix sort exploits a fixed number of digits. It sorts digit by digit using counting sort as a subroutine at each position. O(d × (n + b)) where d is digit count and b is base. Works on any integers regardless of range, as long as the digit count stays bounded. No assumptions about distribution.

Bucket sort exploits a known continuous range with a uniform distribution. Flexible about what sort runs inside each bucket. Degrades hard when distribution is skewed.

The decision tree: integers with a small value range means counting sort. Large integers with bounded digit count means radix sort. Floats or data with a known uniform distribution means bucket sort. If none of those hold, accept your O(n log n) fate and move on. Quicksort is not a consolation prize.

Where Bucket Sort Shows Up in Interviews

Bucket sort rarely appears as "implement bucket sort." It shows up wearing a disguise, and recognizing the shape is the actual skill being tested.

Maximum Gap (LeetCode 164) is the textbook example. Find the maximum difference between consecutive elements in a sorted array, in O(n) time and space. The insight: create n-1 buckets over the range [min, max]. By the pigeonhole principle, at least one bucket must be empty when n elements span n-1 buckets. The maximum gap always crosses an empty bucket, so it always occurs between elements in different buckets, never within one. You only need each bucket's min and max. No comparison sort required. O(n) time, O(n) space.

Top K Frequent Elements (LeetCode 347) and Sort Characters By Frequency (LeetCode 451) use the frequency-bucket pattern. Build an array indexed by frequency (slots 1 through n). Put each element in the slot matching its count. Iterate from high-frequency to low. The O(n log n) sort disappears because the bucket index is already the frequency. No comparisons needed because frequency is an integer in a bounded range.

The signal to recognize: when you need linear-time ordering and each element has a natural mapping to a bounded integer index, you want buckets.

Recognizing these patterns in a live interview, under time pressure, while explaining your reasoning out loud, is a completely different skill than knowing the algorithm at your desk. SpaceComplexity runs voice-based DSA mock interviews where you talk through your approach the way you actually have to, and gives rubric-based feedback on your pattern recognition and communication, not just whether your code ran.

One Structure, Every Language

All implementations below sort floating-point numbers in [0, 1). Adapt the bucket index formula for other ranges using the general pattern above.

def bucket_sort(arr: list[float]) -> list[float]:
    n = len(arr)
    if n <= 1:
        return arr
    buckets: list[list[float]] = [[] for _ in range(n)]
    for num in arr:
        idx = min(int(num * n), n - 1)
        buckets[idx].append(num)
    result: list[float] = []
    for bucket in buckets:
        bucket.sort()
        result.extend(bucket)
    return result

Key Takeaways

Bucket sort beats O(n log n) only when the data is uniformly distributed over a known range. That's not a hedge, it's the entire proof.
Time complexity: O(n) average, O(n²) worst case. Space: O(n + k).
When data is skewed, use a comparison sort. Bucket sort with bad data is slower than quicksort with good data, and you'll have paid the extra overhead for the privilege.

For related non-comparison sorts, see the guides on counting sort and radix sort. For the complexity analysis framework that makes the O(n) proof precise, see recursion time complexity and the master theorem.