Merge Intervals: The Template Behind Every Scheduling Problem

May 26, 20269 min read
dsaalgorithmsinterview-prepleetcode
Merge Intervals: The Template Behind Every Scheduling Problem
TL;DR
  • Overlap condition via De Morgan's law: after sorting by start time, two intervals overlap when curr.start <= last.end
  • The max() bug: omitting max(last_end, curr_end) causes fully-contained intervals to silently truncate your result into wrong, valid-looking output
  • Insert Interval uses a three-phase O(n) linear scan instead of re-sorting an already-sorted list
  • Meeting Rooms II needs a min-heap of end times, or an event sweep with a tie-breaking rule: end events must fire before start events at the same timestamp
  • Non-overlapping Intervals is not merge intervals — it is greedy activity selection that sorts by end time, not start time
  • Five recognition signals: range pairs in input, merge/consolidate language, "minimum number of X to handle all Y", find-the-gaps framing, or insert-a-new-interval into a sorted list

The problem says "meeting rooms." Or "minimum platforms." Or "find the free slots between tasks." No intervals in sight. You spend fifteen minutes inventing something clever before realizing it's just merge intervals wearing a hat.

That's its whole trick. It shows up to the interview in a costume. Once you spot it once, you see it everywhere. The same template covers a surprising number of problems that look nothing alike on first read.


Five Signals This Is Merge Intervals in a Trenchcoat

1. The input has pairs representing ranges. Start/end, arrival/departure, left/right boundary. Any pair of numbers bounding a segment is an interval, regardless of what the problem calls it.

2. "Merge", "combine", "consolidate", or "union" appears. The direct tells. Usually paired with overlapping or adjacent ranges.

3. "Find the minimum number of X to handle all Y." Rooms, platforms, workers, servers. You need to cover all intervals simultaneously, so you need enough X to handle the peak concurrency. Merge intervals is involved, but the goal shifts to counting overlaps rather than merging them.

4. "Find gaps" or "find free time." The complement of merging. Merge all the busy intervals first, then read the gaps between consecutive merged results.

5. "Insert a new interval." If the existing list is already sorted and non-overlapping, don't sort again. This is the three-phase variant covered below.


What Does "Overlap" Actually Mean?

Before any template, you need the right overlap condition. Most people write it by feel and get it slightly wrong. Don't write it by feel. Derive it in reverse.

Start from non-overlap. Two intervals A = [a, b] and B = [c, d] fail to overlap in exactly two cases: A ends before B starts (b < c), or B ends before A starts (d < a). Together:

not_overlap = (a.end < b.start) OR (b.end < a.start)

Negate with De Morgan's law:

overlap = NOT((a.end < b.start) OR (b.end < a.start)) = (a.end >= b.start) AND (b.end >= a.start)

That's the closed-interval form. For [1, 3] and [3, 5], the check gives 3 >= 3 AND 5 >= 1, which is true. Touching endpoints merge. This is correct for most interview problems, but ask: "If one interval ends exactly when another begins, should they merge?" It's a real clarification question with a real answer that changes the equality sign.

After sorting by start time, you know a.start <= b.start always. The condition simplifies to just checking whether b.start <= a.end. That's the one comparison inside the template.


The Merge Intervals Template

def merge(intervals: list[list[int]]) -> list[list[int]]: intervals.sort(key=lambda x: x[0]) result = [intervals[0]] for start, end in intervals[1:]: last_end = result[-1][1] if start <= last_end: # overlap (touching counts) result[-1][1] = max(last_end, end) # ← max() is not optional else: result.append([start, end]) return result

O(n log n) for the sort, O(n) for the sweep. Space is O(n) for the result.

The max() call is the trap. Without it, fully-contained intervals silently corrupt the result.

Given [[1, 10], [2, 3], [4, 5]], after merging [2, 3] into [1, 10], you want result[-1][1] to stay 10, not drop to 3. If you write result[-1][1] = end instead of result[-1][1] = max(last_end, end), the next interval [4, 5] looks like it doesn't overlap (4 > 3), and you emit it separately. The output is [[1, 3], [4, 5]] instead of [[1, 10]].

The bug produces a valid-looking list. It has the right length. The intervals are sorted. No error is raised. Your tests pass if you only tested non-nested inputs. This is the kind of bug that ships to production because the output "looks fine."

Before fix: [[1, 10], [2, 3], [4, 5]] → [[1, 3], [4, 5]] ← wrong After fix: [[1, 10], [2, 3], [4, 5]] → [[1, 10]] ← correct

Side-by-side number line diagram: left panel shows result[-1][1]=3 (wrong, without max), producing [[1,3],[4,5]]; right panel shows max(10,3)=10 (correct, with max), producing [[1,10]]

Interviewer asks to draw a star pattern; coder prints each row manually without seeing the loop When you don't see the pattern, you reinvent it. Line. By. Line.


Insert Interval: Stop Re-Sorting

LeetCode 57 gives you a sorted, non-overlapping list and one new interval. The first instinct is to append the new interval and sort again. That's O(n log n) and throws away the structure you already have.

The input is already sorted. Three phases, each a single linear scan:

def insert(intervals: list[list[int]], new_interval: list[int]) -> list[list[int]]: result = [] i = 0 n = len(intervals) # Phase 1: all intervals ending before new_interval starts while i < n and intervals[i][1] < new_interval[0]: result.append(intervals[i]) i += 1 # Phase 2: merge all overlapping intervals into new_interval while i < n and intervals[i][0] <= new_interval[1]: new_interval[0] = min(new_interval[0], intervals[i][0]) new_interval[1] = max(new_interval[1], intervals[i][1]) i += 1 result.append(new_interval) # Phase 3: remaining intervals start after new_interval ends while i < n: result.append(intervals[i]) i += 1 return result

O(n) time, O(1) extra space. The output stays sorted because phase 1 appends in order, phase 2 appends one merged interval, and phase 3 appends the rest in order.

The merge condition in phase 2 is intervals[i][0] <= new_interval[1]. That's the overlap check: the existing interval's start falls within the (growing) new interval. Update both ends as you go.


When the Problem Wants a Count, Not a List

Here the question changes. You don't want the merged intervals. You want to know the maximum number of intervals active at the same moment, so you know how many rooms (or platforms, or machines) you need.

Approach 1: Min-Heap

Sort by start time. Keep a min-heap of end times representing currently occupied rooms.

import heapq def min_meeting_rooms(intervals: list[list[int]]) -> int: if not intervals: return 0 intervals.sort(key=lambda x: x[0]) heap = [] # end times of active meetings for start, end in intervals: if heap and heap[0] <= start: heapq.heapreplace(heap, end) # reuse the earliest-ending room else: heapq.heappush(heap, end) # need a new room return len(heap)

heap[0] is the smallest end time. If the current meeting starts after or at the same time the earliest room frees up, recycle it. Otherwise, open a new room.

Approach 2: Event Sweep (With a Tie-Breaking Trap)

Split every interval into two events: (start, +1) and (end, -1). Sort by time. Sweep while tracking a running count. The maximum is the answer.

def min_meeting_rooms_sweep(intervals: list[list[int]]) -> int: events = [] for start, end in intervals: events.append((start, 1)) # +1: a meeting begins events.append((end, -1)) # -1: a meeting ends # Sort by time. Tie-break: end events (-1) before start events (+1). events.sort(key=lambda x: (x[0], x[1])) max_rooms = 0 current = 0 for _, delta in events: current += delta max_rooms = max(max_rooms, current) return max_rooms

The tie-breaking on the sort key is not cosmetic. When an end event and a start event share the same timestamp, the sort key (time, delta) puts (5, -1) before (5, +1) because -1 < 1. This means: when one meeting ends at time 5 and another begins at time 5, the ending meeting vacates the room before the new one claims it. That room gets reused. Skip the tie-break, and you count an extra room for every back-to-back pair. The answer is off by one for exactly the cases where rooms are most efficiently shared.

The min-heap handles this implicitly because heap[0] <= start uses <=.


The Disguise Department: Spotting Variants in the Wild

"Employee Free Time" (LC 759): Multiple employees, each with a list of busy intervals. Find the gaps when everyone is simultaneously free. Flatten all intervals into one list, run the merge template, then read the gaps between consecutive merged results. The signal is "find free time" plus inputs organized as ranges.

"Minimum Number of Platforms" (classic railway problem): Given train arrival and departure times, how many platforms are needed so no train has to wait? Identical to Meeting Rooms II. "Minimum number of [shared resource]" plus conflicting time windows equals the counting variant, every time.

"Non-overlapping Intervals" (LC 435): This one is a trap. It asks how many intervals to remove so the rest don't overlap. It looks like merge intervals but it's actually a greedy activity selection problem: sort by end time, greedily keep as many non-overlapping intervals as possible, and the answer is n minus how many you kept. Sorting by start time (as you would for merging) gives the wrong answer here. This is the variant that bites people who pattern-matched too fast.

Leonardo DiCaprio at the Oscars, grinning knowingly: "When a recruiter asks me, a 10 YOE SWE, for a specific data structure I studied in uni and never ever used at work" Actually, you use it constantly. It just shows up wearing a different name.


Recognition Is the Hard Part

The execution is the easy part. You need reps where the pattern isn't announced, where the problem says "railway platforms" and you have ninety seconds to realize it's a heap problem in disguise. SpaceComplexity runs voice-based mock interviews where you narrate your reasoning live. That's the exact skill an interviewer scores: naming the pattern before you touch the keyboard.


Recap

  • Overlap condition: derive via De Morgan's law: intervals overlap iff a.end >= b.start AND b.end >= a.start. After sorting, that reduces to curr.start <= last.end.
  • Touching endpoints ([1,3] and [3,5]) usually merge. Clarify with your interviewer.
  • Core template: sort by start, sweep, update result[-1][1] = max(last_end, curr_end). The max() handles full containment. Without it, contained intervals silently truncate the result.
  • Insert Interval: three-phase linear scan (before, merge, after). Don't re-sort sorted input.
  • Counting problems (minimum rooms): min-heap of end times, or event sweep. Event sweep has a tie-breaking requirement: sort (time, -1) before (time, +1) so end events fire before start events at the same timestamp.
  • Non-overlapping Intervals is not merge intervals. It's greedy activity selection. Sort by end time, not start time.

Further Reading