Custom Sort Comparator Bugs Your Tests Will Never Catch

June 6, 202610 min read
dsaalgorithmsinterview-prepleetcode
Custom Sort Comparator Bugs Your Tests Will Never Catch
TL;DR
  • a - b comparators overflow on Integer.MIN_VALUE; use Integer.compare(a, b) in Java or (a > b) - (a < b) in Python
  • Violating strict weak ordering (using >= instead of >) causes undefined behavior in C++ and wrong output in Java
  • JavaScript's .sort() is lexicographic by default; always pass (a, b) => a - b for numeric arrays
  • Sort is O(n log n) and often replaceable with O(n) hashing, counting sort, or QuickSelect when order isn't actually required
  • Stability varies by language: Python and Java object sorts are stable; Java primitive arrays and C++ std::sort are not
  • Counting sort needs non-negative indices; offset by min_value when input includes negatives
  • QuickSort degrades to O(n²) on sorted input when using a naive leftmost pivot

You call .sort(). Your five test cases pass. You ship it.

Then the interviewer adds one input and your custom comparator silently produces the wrong order. No exception. No stack trace. Just a subtly scrambled array that looks totally plausible until you trace through it with your finger on the screen and go "...wait."

Sorting feels like solved territory. That's exactly why its bugs are so dangerous. They don't fail loudly. They hide in edge cases your examples will never reach in a million years of normal testing, and they only surface when someone who specifically knows the failure mode decides to poke at it.

That someone, in an interview, is the interviewer.

Your Custom Sort Comparator Has a Hidden Time Bomb

The cleanest-looking integer comparator is a - b. Two characters, obvious logic, works on literally every input you'll try in development. Almost.

// Java - looks clean, silently wrong on extreme values Comparator<Integer> cmp = (a, b) -> a - b; int a = 1, b = Integer.MIN_VALUE; // b = -2,147,483,648 System.out.println(a - b); // → -2,147,483,647 (negative!) // Comparator says 1 < Integer.MIN_VALUE. That's backwards.

The bug is integer overflow. When b is near Integer.MIN_VALUE and a is a small positive number, the subtraction wraps around to a large negative value. Your comparator reports the wrong relationship, and the sort silently scrambles your output. No crash. No warning. Just wrong.

Integer overflow meme - programmer confidently wrong about arithmetic

"Passes all my tests" is doing a lot of heavy lifting when Integer.MIN_VALUE was never in those tests.

This never appears in typical test data. An interviewer who knows it will construct [Integer.MIN_VALUE, 1] specifically to catch it, watch you nod confidently, and write something in their notes.

The fix is replacing subtraction with comparison:

// Java - correct Comparator<Integer> cmp = Integer::compare; // or: (a, b) -> Integer.compare(a, b) // or: (a, b) -> (a < b) ? -1 : (a == b ? 0 : 1)
# Python - don't return a - b from a cmp function # Use: def cmp(a, b): return (a > b) - (a < b) # evaluates to 1, 0, or -1, no overflow

The (a > b) - (a < b) form evaluates two boolean subexpressions and subtracts them, producing 1, 0, or -1 with no arithmetic that can overflow. Note that JavaScript comparators use 64-bit floats, so (a, b) => a - b is safe there. This is a Java/C++ integer-specific trap.

See integer overflow in interview problems.

Breaking the Contract Without Knowing It

A comparator is not just a "return positive or negative" function. It's a formal contract, and sorting algorithms assume that contract holds to function correctly. Violate the contract and you get undefined behavior in C++, wrong output in Java, and crashes that only appear on large inputs.

The contract has three requirements. For a comparator cmp(a, b):

  • Irreflexivity: cmp(a, a) must return false. An element is never less than itself.
  • Antisymmetry: if cmp(a, b) is true, then cmp(b, a) must be false.
  • Transitivity: if cmp(a, b) and cmp(b, c), then cmp(a, c).

This set of properties is called strict weak ordering. The most common violation is >= instead of >, a typo with consequences:

// C++ - WRONG: cmp(5, 5) returns true, violates irreflexivity sort(arr.begin(), arr.end(), [](int a, int b) { return a >= b; });

When a == b, this comparator returns true for both cmp(a, b) and cmp(b, a). The sort interprets that as a contradiction and may swap elements indefinitely or overrun an array boundary. In C++, this is undefined behavior. The program might produce garbage or crash at an address nowhere near the sort call. Good luck debugging that.

A subtler version shows up in multi-field comparators:

// Java - compiles, runs, but second field is silently ignored Comparator<int[]> cmp = (a, b) -> { if (a[0] != b[0]) return Integer.compare(a[0], b[0]); return 0; // treats any two elements with equal first fields as equal };

If you mean to sort by a[0] then a[1], returning 0 when the first fields match leaves a[1] in arbitrary order. The fix:

Comparator<int[]> cmp = (a, b) -> { if (a[0] != b[0]) return Integer.compare(a[0], b[0]); return Integer.compare(a[1], b[1]); }; // Or, more concisely: Comparator.comparingInt((int[] x) -> x[0]).thenComparingInt(x -> x[1])

JavaScript's Default Sort Doesn't Know Numbers Exist

If you write .sort() with no argument in a JavaScript interview, congratulations. You just sorted your numbers as strings. By their first character. In Unicode order.

[40, 1, 5, 200, 30].sort() // → [1, 200, 30, 40, 5]

JavaScript sort meme

This is not a bug. It is, horrifyingly, working exactly as specified.

The ECMAScript spec defines the default sort as: convert each element to a string, then compare by UTF-16 code unit values. So "200" sorts before "30" because "2" < "3" in Unicode. Fully intentional, completely documented, and totally wrong for any numeric sorting you'll ever want to do.

Every numeric sort in JavaScript needs an explicit comparator:

[40, 1, 5, 200, 30].sort((a, b) => a - b) // → [1, 5, 30, 40, 200]

One historical note: V8 used an unstable QuickSort for arrays over 10 elements until 2018. All modern engines now use Timsort and guarantee stability since ECMAScript 2019. If you cite JavaScript sort instability in an interview, you're describing behavior from eight years ago. The interviewer will notice.

You're Sorting When You Should Be Hashing

Sorting is O(n log n), and many problems that look like sorting problems are actually O(n) hash problems wearing sorting as a disguise. Reaching for sort as a first step is a common tell that you haven't fully analyzed the problem structure.

A few patterns where sort is overkill:

Finding duplicates: sort, scan for adjacent equals. O(n log n). A hash set does it in O(n).

Two Sum: sort, two pointers. O(n log n). A single hash map scan is O(n).

kth largest: sort, index from the end. O(n log n). QuickSelect finds any order statistic in O(n) average. A min-heap of size k runs in O(n log k), which beats a full sort when k is small.

Frequency counting: sort, count runs. O(n log n). A hash map is O(n).

When values fall in a bounded range, counting sort beats comparison sort entirely. For n integers in [0, k], it runs in O(n + k). When k = O(n), that's linear. Most candidates never reach for it because they haven't internalized that the O(n log n) floor is a theorem about comparison-based sorting, not a law of physics. Comparison sorts provably cannot beat it in the general case.

The question to ask before every sort call: does this algorithm actually need ordered data, or does it just feel more tractable once the array is sorted? Those are different justifications, and only one earns the O(n log n) cost. See the full sorting algorithm comparison for when each one applies.

Stability Assumptions That Break Across Languages

Stability means a sort preserves the original relative order of equal elements. This sounds academic until it silently destroys your multi-key sort.

  • Python: always stable (Timsort)
  • Java objects: stable (Timsort via Arrays.sort for reference types)
  • Java primitives: not stable (dual-pivot QuickSort)
  • C++ std::sort: not stable. std::stable_sort is, at O(n log² n) worst case
  • JavaScript: stable in all modern engines since ECMAScript 2019

This matters when you're sorting by multiple criteria and assuming one sort preserves the order from a previous one.

Python lets you exploit stability to chain sorts:

# Sort people by last name, then first name within ties people.sort(key=lambda p: p.first_name) # secondary key, first pass people.sort(key=lambda p: p.last_name) # primary key, second pass # Works because Python guarantees stability: the first-name order # within each last-name group survives the second sort

This breaks silently in a language with an unstable sort. The language-agnostic safe version is a single comparator:

people.sort(Comparator.comparing(Person::getLastName) .thenComparing(Person::getFirstName));

If an interviewer asks about stability trade-offs: Java's Arrays.sort is stable for object arrays (Timsort) but not for int[], long[], or other primitive arrays (dual-pivot QuickSort). The JDK chose this deliberately. Primitives can't have identity, so "equal" elements are indistinguishable and stability is meaningless for them. Mentioning that is a strong signal.

See stable sort and multi-key sorting.

The Edge Cases Sorting Actually Introduces

Sorting doesn't eliminate edge cases. It shifts them somewhere less obvious, which is arguably worse.

All-same elements: Every pair comparison returns 0. Make sure nothing downstream assumes distinct values after sorting. Interval merging algorithms can behave unexpectedly when all intervals are identical.

Counting sort with negative numbers: Counting sort uses values as array indices. If the input can include negatives, offset by min_value before indexing:

def counting_sort(nums): offset = -min(nums) k = max(nums) - min(nums) + 1 count = [0] * k for n in nums: count[n + offset] += 1 result = [] for i, c in enumerate(count): result.extend([i - offset] * c) return result

The "Largest Number" all-zeros case: LeetCode 179 builds a string-concatenation comparator. For [0, 0], you get "00". The correct output is "0". One check at the end catches it:

from functools import cmp_to_key def largestNumber(nums): strs = list(map(str, nums)) def cmp(a, b): return (int(b + a) > int(a + b)) - (int(b + a) < int(a + b)) strs.sort(key=cmp_to_key(cmp)) return "0" if strs[0] == "0" else "".join(strs)

Naive QuickSort on pre-sorted input: Always picking the leftmost pivot degrades to O(n²) on sorted data. Every partition produces one element and n-1 remaining. Random pivot or median-of-three avoids it. Mentioning this when you discuss worst-case complexity signals you understand where the theory breaks down in practice.

The Recap

  • a - b comparators overflow for extreme integers. Use Integer.compare(a, b) in Java, (a > b) - (a < b) in Python.
  • A valid comparator must be irreflexive, antisymmetric, and transitive. Using >= violates irreflexivity and produces undefined behavior in C++.
  • JavaScript's .sort() is lexicographic without a comparator. Always pass (a, b) => a - b for numbers.
  • Sort is O(n log n). Hashing, counting sort, and QuickSelect offer O(n) for specific problem shapes. Check whether the problem actually needs ordered data before sorting.
  • Python and modern Java object sorts are stable. Java primitive arrays and C++ std::sort are not. For multi-key sorting across languages, use a single comparator with .thenComparing().
  • Counting sort needs non-negative indices. Offset by min_value for negative ranges.
  • QuickSort degrades to O(n²) on sorted input with a naive pivot.

If you want to practice catching these in a live interview, SpaceComplexity runs voice-based mock interviews with rubric scoring that covers exactly these correctness decisions. It's the fastest way to find out whether you'd actually catch the overflow bug under pressure, or just nod along.

Further Reading