Top-K Elements All Look Different. The Heap Signal Isn't.

You're three minutes into a problem. The constraints say n can be up to 10^5, k is "much smaller" than n, and the question asks for the k most frequent words. You reach for a sort. It works. But it's O(n log n) and your interviewer is writing something down.

The heap pattern solves every top-K variant in O(n log k). That sounds like a marginal win until you look at the numbers. When n = 100,000 and k = 10, you've replaced roughly 1.7 million operations with 230,000. And a lot of top-K problems involve streams where you never have all the data at once. You cannot sort a stream.

The hard part is recognition. Problems that need this solution don't all say "top K." Some say "kth largest in a stream." Some say "k closest points to origin." One asks you to schedule CPU tasks. All of them reach for the same data structure. Learn the signal once and you see through all of them.

Interview meme: "My grandma runs faster than your code"

The moment you propose bubble sort in a coding interview.

Five Signals That Point to a Heap

The clearest signal is any mention of k with a ranking. "K largest," "kth smallest," "k most frequent," "k closest," "k pairs with smallest sums." If the problem cares about a ranked subset of n elements, think heap.

The word "stream" is the silent tell. If input arrives incrementally and you need a running statistic, sorting is off the table. A min-heap of size k holds your running top-k with O(log k) per insert. That's the entire data structure requirement for problems like "kth largest in a stream" (LC 703).

The other signals, roughly in order of subtlety:

"Find the median from a data stream" (two-heap variant, covered below)
"Merge k sorted lists/arrays" (k-way merge)
"Given n tasks with a cooldown, find minimum CPU time" (frequency + max-heap greedy)
Any problem where the constraint block specifies k explicitly and k is much smaller than n

That last one is a constraint-reading habit worth building. When k appears in constraints and the problem doesn't say "sort the output," the O(n log k) heap approach is almost certainly the intended solution. The heap data structure is purpose-built for this: O(log n) insert and removal, O(1) access to the minimum or maximum.

The Template Feels Backward. That's the Point.

Here's the counterintuitive flip that trips nearly everyone: to find the K largest elements, you maintain a min-heap of size K. Not a max-heap.

Yes. To keep the biggest things, you track the smallest. Your brain says this is wrong. Your brain is bad at heaps.

The logic becomes clear once you see the invariant. A min-heap of size K stores exactly the K largest elements seen so far. The root, the minimum of those K, is the Kth largest overall. When a new element arrives, push it. If the heap exceeds size K, pop the minimum. You're evicting the weakest member of your current top-K to make room for someone who might deserve a spot.

import heapq

def top_k_largest(nums: list[int], k: int) -> list[int]:
    heap: list[int] = []
    for num in nums:
        heapq.heappush(heap, num)
        if len(heap) > k:
            heapq.heappop(heap)   # evict the smallest of the top-k
    return sorted(heap, reverse=True)

O(n log k) time. O(k) space. The heap never grows past K.

The wrong turn most candidates take: build a max-heap of all n elements, then pop K times. That's O(n + k log n) time and O(n) space. Still faster than sorting for small K, but it holds all n elements in memory. For a stream of a million numbers, that's a non-starter.

The symmetric rule: for the K smallest elements, maintain a max-heap of size K. Push each element (negated in Python, since heapq is always a min-heap). When size exceeds K, pop the maximum, evicting the largest element seen so far. What remains is the K smallest.

Three Forms, One Underlying Template

Frequency-based top-K. LC 347 (Top K Frequent Elements). Build a frequency map first, then run the heap template keyed on (count, value).

from collections import Counter
import heapq

def top_k_frequent(nums: list[int], k: int) -> list[int]:
    freq = Counter(nums)
    heap: list[tuple[int, int]] = []
    for val, count in freq.items():
        heapq.heappush(heap, (count, val))
        if len(heap) > k:
            heapq.heappop(heap)
    return [val for count, val in heap]

There is a bucket sort O(n) alternative for this specific problem. Since frequencies are bounded by n, create n+1 buckets indexed by frequency and sweep from high to low, collecting elements until you have k. The hash map lookup is O(1) and no heap is needed. Reach for this only if your interviewer explicitly asks whether you can beat O(n log k). Most won't.

K-way merge. Given K sorted lists, merge them in sorted order. The heap holds one element per list as a (value, list_index, element_index) tuple. Pop the minimum, output it, advance that list's pointer, push the next element.

import heapq

def merge_k_sorted(lists: list[list[int]]) -> list[int]:
    heap: list[tuple[int, int, int]] = []
    for i, lst in enumerate(lists):
        if lst:
            heapq.heappush(heap, (lst[0], i, 0))

    result: list[int] = []
    while heap:
        val, i, j = heapq.heappop(heap)
        result.append(val)
        if j + 1 < len(lists[i]):
            heapq.heappush(heap, (lists[i][j + 1], i, j + 1))
    return result

O(n log k) where n is total elements across all lists. The heap size stays exactly K throughout.

Two heaps for a running median. LC 295. Split the stream into two halves: a max-heap for the lower half and a min-heap for the upper half. After every insertion, the sizes differ by at most 1. The median is the top of the larger heap, or the average of both tops when sizes are equal.

import heapq

class MedianFinder:
    def __init__(self) -> None:
        self.lo: list[int] = []   # max-heap via negation
        self.hi: list[int] = []   # min-heap

    def addNum(self, num: int) -> None:
        heapq.heappush(self.lo, -num)
        # push lo's max to hi to maintain ordering invariant
        heapq.heappush(self.hi, -heapq.heappop(self.lo))
        # rebalance: lo must be >= hi in size
        if len(self.hi) > len(self.lo):
            heapq.heappush(self.lo, -heapq.heappop(self.hi))

    def findMedian(self) -> float:
        if len(self.lo) > len(self.hi):
            return float(-self.lo[0])
        return (-self.lo[0] + self.hi[0]) / 2.0

The rebalancing order matters. Push to lo first, always move lo's max to hi, then check if hi grew too large. Get the order wrong and you'll have elements in the wrong half, and the median calculation will silently produce wrong answers. Which you will notice at the worst possible time.

Sliding Windows Add a Catch. Here It Is.

Once you add a heap to your solution, a common follow-up is: "now let the window slide." Elements leave the window as new ones enter. But heaps don't support O(log n) removal of an arbitrary element. Removal from the middle is O(n).

The fix is lazy deletion: don't remove elements immediately. Store (value, index) pairs in the heap. When you query the root, check if its index falls within the current window. If not, pop and discard. Repeat until the root is valid.

import heapq

def sliding_window_max(nums: list[int], k: int) -> list[int]:
    heap: list[tuple[int, int]] = []
    result: list[int] = []
    for i, num in enumerate(nums):
        heapq.heappush(heap, (-num, i))
        window_start = i - k + 1
        # discard elements that have left the window
        while heap[0][1] < window_start:
            heapq.heappop(heap)
        if i >= k - 1:
            result.append(-heap[0][0])
    return result

Stale elements buried deep in the heap are harmless until they surface. When they reach the top, you discard them before reading the root value. The amortized cost of all those discards is O(n log n) across the full array, since each element is pushed and popped at most once. This is why sliding window plus heap problems end up O(n log n) rather than the O(n) you might hope for.

When the O(n log k) Advantage Disappears

The advantage over O(n log n) sorting depends on k being much smaller than n. When k approaches n, log k approaches log n and the gap closes. At that point you might as well sort.

One non-obvious optimization: when k > n/2, flip the problem. Finding the top-K largest is equivalent to finding the bottom (n-K) smallest and excluding them. When k > n/2, n-K < n/2, so a heap of size n-K is cheaper than a heap of size K.

Uno reverse card in soccer match

When k > n/2: just Uno reverse the problem.

When you need only the Kth element (not all K elements) on a static array, quickselect runs in O(n) average time with O(1) extra space, beating the O(n log k) heap. The trade-off: quickselect has O(n²) worst case and cannot handle streams. For streams or when K elements must be returned in sorted order, the heap wins.

What Your Language Is Hiding

Python. heapq is min-heap only. Negate values for max-heap behavior. Watch the negation edge case with float('inf'): negating gives float('-inf'), which silently inverts your ordering. For non-numeric types, use a wrapper class with __lt__ reversed instead of negation.

The sneakier Python trap: push (priority, item) tuples and your code works fine in testing. Under production load, the moment two items share the same priority, Python tries to compare the item objects to break the tie. If those objects lack __lt__, you get a TypeError buried inside heapq._siftdown with no application code in the traceback. You will discover this at 2am on a Friday. The canonical fix is a three-tuple with a counter: (priority, next(counter), item) where counter = itertools.count(). The monotonically increasing counter guarantees ties are always resolved at position two, so your objects are never compared.

import heapq
import itertools

counter = itertools.count()
heap: list = []
heapq.heappush(heap, (priority_value, next(counter), task_object))

Java. PriorityQueue is a min-heap by default. Pass Collections.reverseOrder() for a max-heap, or a lambda comparator for custom ordering. Comparator.comparingInt(x -> x.frequency) is cleaner than a subtraction-based comparator, which overflows for Integer.MIN_VALUE inputs.

C++. std::priority_queue is a max-heap by default. For a min-heap: priority_queue<int, vector<int>, greater<int>>. The greater<> trips people because "greater comparator means min-heap" is counterintuitive. The comparator describes how to order elements in the priority queue, and pushing "greater" elements to the top of comparison means smaller values end up at the queue front.

The Template Is Easy. Recognition Is the Hard Part.

Knowing the template is half the job. The other half is seeing "k closest points to origin" and immediately reaching for a max-heap of size k keyed on squared distance. That leap from problem description to data structure only comes from drilling under time pressure. SpaceComplexity puts you in a voice-based interview where you have to name the pattern out loud, defend it, and handle follow-ups like "what if k is larger than n/2?" in real time. Reading about heaps does not train that reflex. Doing it does.

Quick Recap

Signal: any mention of k with a ranking, "stream" plus ordering, k much smaller than n
Top-K largest: min-heap of size k; pop when size exceeds k
Top-K smallest: max-heap of size k; pop when size exceeds k
Frequency-based: build a frequency map first, then heap on (count, value)
Running median: max-heap for lower half, min-heap for upper half, sizes differ by at most 1
K-way merge: heap of size k with one (value, list, index) triple per list
Sliding windows: lazy deletion with (value, index) pairs, discard stale at query time
When k > n/2: flip the problem, find the (n-k) opposite-end elements instead
When you need only the Kth element on a static array: quickselect over heap
Python tuple trap: use (priority, counter, item) to avoid TypeError on equal priorities