Interval Tree: One Extra Field, Every Overlap Found in O(log n)

- Interval tree = a BST keyed on
interval.lowwith one extramax_highfield per node, storing the maximum high endpoint in the subtree. - The overlap predicate requires both
a.low <= b.highANDb.low <= a.high; one condition gives wrong answers on adjacent intervals. - INTERVAL-SEARCH goes left when
left.max_high >= query.low, otherwise right. This greedy rule is provably exhaustive, not just a heuristic. - Find-any runs in O(log n); find-all for k overlapping intervals runs in O((k+1) log n) using the same two pruning conditions.
- The CLRS augmentation theorem guarantees
max_highstays correct through rotations in O(1) extra work, keeping insert and delete at O(log n). - Real-world deployments include Linux kernel VMA tracking (pre-6.1), database temporal indexes, bioinformatics range queries, and computational geometry.
You have a thousand calendar events. A new booking arrives. Does it conflict with anything? You could scan all thousand. That works fine until you have ten thousand. Then a million. Then your calendar is a database and the scan is someone's production incident.
The O(n) approach, personified.
An interval tree answers "which stored intervals overlap this query?" in O(log n) per hit, using a single augmented field per BST node. The entire trick is one extra integer. Everything else follows from it.
The One Field That Powers Everything
Visually, an interval tree looks like a normal binary search tree. The BST ordering key is the low endpoint of each interval. Every node also stores one extra value, max_high, the maximum high endpoint across every interval in that node's subtree.
That field is the machine. It lets you prune entire subtrees in O(1) without inspecting a single interval inside them.
Two Intervals Overlap When Both Conditions Hold
Get the overlap predicate right first. Intervals [a, b] and [c, d] overlap when:
a <= d AND c <= b
Both conditions are required. Checking only one produces wrong answers on intervals that are adjacent but do not cross. [10, 15] and [16, 20] satisfy a <= d (10 <= 20) but do not satisfy c <= b (16 <= 15). They do not overlap. The one-condition version would say they do. Confidently and incorrectly.
Closed endpoints: [10, 15] and [15, 20] do overlap, at the single point 15.
Case 2 is the trap. The first condition passes. The second doesn't. Single-condition code would call it an overlap and confidently be wrong.
max_high Propagates Upward on Every Insert
Insert intervals one at a time using interval.low as the BST key, then update max_high along the path back to the root.
For any node x:
max_high[x] = max(x.interval.high,
max_high[x.left],
max_high[x.right])
The tree after inserting six intervals in order: [15,20], [10,30], [17,19], [5,20], [12,15], [30,40].
[15, 20] max=40
/ \
[10, 30] [17, 19] max=40
max=30 / \
/ \ (nil) [30, 40]
[5, 20] [12, 15] max=40
max=20 max=15
The max_high at the root, 40, is the high endpoint of [30,40]. It propagated up through [17,19] and then through [15,20]. Every node's max_high tells you, without visiting any descendants, the rightmost point any interval in this subtree reaches.
If a query interval starts after that point, nothing in the subtree can overlap it. You prune the whole branch.
Each node's max_high is the ceiling of all endpoints in its subtree. One integer. Entire branches pruned for free.
Does Balancing Break max_high?
A self-balancing BST (red-black tree, as in CLRS Chapter 14) rotates nodes on insert and delete. A rotation swaps parent and child. Does that break max_high?
No, and this is the rare case where "trust the textbook" is actually the right answer. The CLRS augmentation theorem states: if a field at each node depends only on the node itself and its direct children, rotations can maintain it in O(1) extra work per rotation. The max_high formula fits exactly. Insert and delete stay O(log n).
A plain unbalanced BST also works correctly. It just degrades to O(n) on sorted input. Production implementations (the Linux kernel's VMA tree before kernel 6.1, database range indexes) use a balanced BST underneath.
INTERVAL-SEARCH: Finding Any One Overlap
x = root
while x is not nil and x.interval does not overlap query:
if x.left is not nil and x.left.max_high >= query.low:
x = x.left
else:
x = x.right
return x
One decision per node. No backtracking. That's the whole algorithm.
At each node there is exactly one decision: go left or go right. The loop terminates when it finds an overlap or falls off the tree.
Going Right Is Trivial
When x.left.max_high < query.low, every interval in the left subtree ends before query.low. No left-subtree interval can overlap the query. Left is dead; go right.
Going Left Proves the Search Exhaustive
When x.left.max_high >= query.low, there exists some interval i' in the left subtree with high[i'] >= query.low. If you search the entire left subtree and find nothing, it means every candidate i' has low[i'] > query.high (because high[i'] >= query.low is already satisfied; the only remaining failure mode is that i' starts too late).
Now look at the right subtree. Every interval j in the right subtree has low[j] >= low[x] by BST order, and low[x] >= low[i'] > query.high. So every right-subtree interval also starts after the query ends. Going left is correct and exhaustive: when the left subtree comes up empty, the right subtree is provably empty too.
This is the key invariant the algorithm preserves at every step. It is not obvious, and it is why the greedy "go left if you can" rule is actually a provably exhaustive search. You don't need to reconstruct this proof in an interview. You do need to know that the direction choice is not a heuristic.
Walking Through Query [14, 16]
Query: [14, 16]. Root is [15, 20].
Step 1: node=[15,20]. Overlaps [14,16]? 15<=16 ✓ and 14<=20 ✓ → YES. Return [15,20].
Find-any terminates in one comparison. Find-all recurses through the whole tree with pruning. The following trace is long by design. That's how you know it's actually finding everything.
search_all([15,20], [14,16]):
max_high=40 >= 14 → proceed
recurse left into [10,30]:
max_high=30 >= 14 → proceed
recurse left into [5,20]:
max_high=20 >= 14 → proceed
recurse left → nil, stop
[5,20] overlaps [14,16]? 5<=16 ✓ and 14<=20 ✓ → add [5,20]
low=5 <= 16 → recurse right → nil, stop
[10,30] overlaps [14,16]? 10<=16 ✓ and 14<=30 ✓ → add [10,30]
low=10 <= 16 → recurse right into [12,15]:
max_high=15 >= 14 → proceed
recurse left → nil, stop
[12,15] overlaps [14,16]? 12<=16 ✓ and 14<=15 ✓ → add [12,15]
low=12 <= 16 → recurse right → nil, stop
[15,20] overlaps [14,16]? YES → add [15,20]
low=15 <= 16 → recurse right into [17,19]:
max_high=40 >= 14 → proceed
recurse left → nil, stop
[17,19] overlaps [14,16]? 17<=16? NO → skip
low=17 > 16 → PRUNE right subtree [30,40]
Results: [5,20], [10,30], [12,15], [15,20]
[30,40] is pruned entirely by the BST-low-endpoint check (node.interval.low <= query.high). Four results, zero wasted work on [30,40].
Interval Tree Time Complexity
| Operation | Time | Space |
|---|---|---|
| Build | O(n log n) | O(n) |
| Insert | O(log n) | O(1) extra |
| Delete | O(log n) | O(1) extra |
| Search any | O(log n) | O(1) |
| Search all | O((k+1) log n) | O(log n) stack |
| Space total | O(n) |
The O((k+1) log n) search-all bound holds because every overlapping interval costs O(log n) to reach, and the two pruning conditions together guarantee you never traverse a subtree that yields nothing. The +1 accounts for the final fruitless descent that confirms the search is over. Compare this to a naive approach of repeatedly calling find-any and removing the result: that destroys the tree and costs O(k log n) just in deletions, on top of the search cost.
For insert and delete with a balanced BST underneath, O(log n) holds in the worst case. With a plain BST it is O(h), which is O(log n) expected for random insertion order but O(n) worst case.
One Structure, Every Language
The implementations below use a plain BST for clarity. For production use with worst-case guarantees, wrap the node inside a red-black tree (Java's TreeMap, C++'s std::map, or a hand-rolled implementation). The insert, search, and search_all logic is identical; only the balancing changes.
from __future__ import annotations from typing import Optional class Interval: __slots__ = ("low", "high") def __init__(self, low: int, high: int) -> None: self.low = low self.high = high def overlaps(self, other: "Interval") -> bool: return self.low <= other.high and other.low <= self.high def __repr__(self) -> str: return f"[{self.low}, {self.high}]" class Node: __slots__ = ("interval", "max_high", "left", "right") def __init__(self, interval: Interval) -> None: self.interval = interval self.max_high = interval.high self.left: Optional[Node] = None self.right: Optional[Node] = None def _max_high(node: Optional[Node]) -> int: return node.max_high if node is not None else -1 def _update(node: Node) -> None: node.max_high = max( node.interval.high, _max_high(node.left), _max_high(node.right), ) def _insert(node: Optional[Node], interval: Interval) -> Node: if node is None: return Node(interval) if interval.low < node.interval.low: node.left = _insert(node.left, interval) else: node.right = _insert(node.right, interval) _update(node) return node def _search(node: Optional[Node], query: Interval) -> Optional[Interval]: while node is not None: if node.interval.overlaps(query): return node.interval if node.left is not None and node.left.max_high >= query.low: node = node.left else: node = node.right return None def _search_all( node: Optional[Node], query: Interval, result: list[Interval] ) -> None: if node is None or node.max_high < query.low: return _search_all(node.left, query, result) if node.interval.overlaps(query): result.append(node.interval) if node.interval.low <= query.high: _search_all(node.right, query, result) class IntervalTree: def __init__(self) -> None: self._root: Optional[Node] = None def insert(self, low: int, high: int) -> None: self._root = _insert(self._root, Interval(low, high)) def search(self, low: int, high: int) -> Optional[Interval]: return _search(self._root, Interval(low, high)) def search_all(self, low: int, high: int) -> list[Interval]: result: list[Interval] = [] _search_all(self._root, Interval(low, high), result) return result if __name__ == "__main__": tree = IntervalTree() for lo, hi in [(15, 20), (10, 30), (17, 19), (5, 20), (12, 15), (30, 40)]: tree.insert(lo, hi) print(tree.search(14, 16)) # [15, 20] print(tree.search_all(14, 16)) # [[5,20],[10,30],[12,15],[15,20]]
Where It Gets Used
The defining use case is anything where you store a collection of ranges and query which ones overlap a given range or point. Interval trees are the right tool when n is large enough that O(n) per query is unacceptable and the intervals change (insert/delete are O(log n), unlike sorted arrays where insert is O(n)).
Scheduling and calendar systems. Given a set of booked time slots, does a new booking conflict? Which existing meetings overlap a proposed block? Every calendar system with conflict detection uses some form of interval overlap query. An interval tree makes it O(log n) per check.
Operating system memory management. The Linux kernel tracks each process's virtual memory areas (VMAs) as intervals [start, end]. When a page fault fires, the kernel searches for which VMA contains the faulting address. Before kernel 6.1, this was a red-black tree augmented exactly as described here. Kernel 6.1 moved to the maple tree, but the interval-tree augmentation approach powered Linux for over two decades. Your laptop has been running on this algorithm since before you could drive.
Computational geometry and rendering. Stabbing queries ("which rectangles contain this point?"), clipping ("which polygons intersect this viewport?"), and collision detection all reduce to interval overlap queries, sometimes in multiple dimensions.
Database temporal joins. A table of rows valid over time periods [valid_from, valid_to]. Query: find all rows active during a given window. This is a pure interval overlap problem. Temporal databases build interval indexes on top of B-tree variants for exactly this reason.
Bioinformatics. Genomic intervals are everywhere. Given a set of gene annotations (chromosome position ranges), find all genes that overlap a sequencing read. Tools like BEDTools perform billions of these queries and use interval trees internally.
Network packet filtering. IP address ranges (CIDR blocks), port ranges in firewall rules, and ACL entries all define intervals. Finding which rules apply to a packet is an interval stabbing query.
When to Reach for an Interval Tree
Three signals that an interval tree is the right call:
Your data is a collection of ranges or intervals. Not individual points, not values, but things that span from one number to another. Time windows, address ranges, coordinate spans.
Your queries ask about containment or overlap. "Does anything contain this point?" and "Does anything overlap this range?" are both interval stabbing queries. If you find yourself writing a linear scan over ranges to check each one, you are doing an O(n) operation that an interval tree would make O(log n).
You need dynamic updates. If your interval set is static, a sorted array with binary search handles point stabbing in O(log n) and a segment tree or sparse table handles aggregate range queries. When intervals are inserted and deleted at runtime, the BST-backed interval tree holds the O(log n) guarantee through updates.
Worked Example 1: Booking Conflict Detector
Problem. A coworking space has a list of room bookings. Each booking is a half-open time interval [start, end) in minutes since midnight. When a new booking request arrives, find all existing bookings that conflict with it.
Two bookings [a, b) and [c, d) conflict when a < d AND c < b (using half-open intervals; swap to <= for closed). This is the overlap predicate in a different notation. Build the interval tree once as bookings are added. Each new-request query calls search_all(request.start, request.end - 1) (converting to closed intervals) and returns the conflicts.
Without the interval tree: O(n) per query. With n bookings and n queries, that is O(n²). With the interval tree: O(log n) build per booking, O((k+1) log n) per query where k is the number of conflicts.
In a coworking space with 10,000 bookings and zero or one conflict per query on average, the linear scan processes up to 10,000 intervals each time. The interval tree processes roughly 14 nodes. The linear scan has a meeting about it.
Worked Example 2: Database Temporal Query
Problem. An employee_contracts table has columns (employee_id, start_date, end_date). You want to find all employees who were actively employed during a reporting window [window_start, window_end].
An active contract during the window means the contract interval overlaps the window interval: start_date <= window_end AND window_start <= end_date. Build an interval tree keyed on start_date. For each reporting query, call search_all(window_start, window_end). The result set is exactly the active employees.
The alternative is a full table scan. With an index on start_date alone you can prune contracts that start after the window, but you cannot prune contracts that end before the window starts without max_high. The augmented field does exactly that pruning.
When a Segment Tree Is Better
Interval trees handle overlap queries on dynamic sets. If your problem involves range aggregate queries on fixed data (sum from index 3 to 7, max from index 0 to 100), a segment tree gives O(log n) query and O(log n) point update with simpler code and better cache behavior.
If your intervals are one-dimensional and you only need point stabbing (not range overlap), a sorted array with binary search on start and a separate check against end is sufficient and has better constants.
What You Should Take Away
The whole structure fits in six facts.
- An interval tree is a BST keyed on
interval.low, with one extra fieldmax_highper node. max_high[x] = max(interval.high, max_high[left], max_high[right]). That formula is the whole structure.- The overlap predicate requires both conditions:
a.low <= b.high AND b.low <= a.high. One condition gives wrong answers. - INTERVAL-SEARCH goes left if
left.max_high >= query.low, otherwise right. This is provably correct and provably exhaustive. - Find-any is O(log n). Find-all is O((k+1) log n). Both use the same tree.
- The CLRS augmentation theorem guarantees
max_highstays correct through rotations in O(1) extra work, keeping insert and delete at O(log n) on a balanced BST. - Production implementations use a balanced BST underneath (red-black tree, AVL tree). Plain BST degrades to O(n) on adversarial input.
- Real-world deployments: Linux kernel VMAs (pre-6.1), database temporal indexes, bioinformatics range queries, computational geometry.
If you are preparing for a systems design or DSA interview and want to practice explaining structures like this out loud, SpaceComplexity runs voice-based mock interviews with rubric-based feedback. Explaining why the going-left proof works under time pressure is a different skill from reading it.
Ranges and overlap queries. One extra field. O(log n). That's the whole deal.
For deeper reading on the balanced BSTs that underpin production interval trees, see AVL Tree vs Red-Black Tree. For range aggregate queries on static arrays where an interval tree would be overkill, The Segment Tree is the right structure instead.
Further Reading
- Interval tree on Wikipedia covers the full taxonomy including centered interval trees and segment trees.
- GeeksforGeeks: Interval Tree has an additional walkthrough of the CLRS algorithm with examples.
- Introduction to Algorithms, Chapter 14.3 is the primary source for the augmentation theorem and the correctness proof.
- LeetCode: Merge Intervals (#56) and Insert Interval (#57) are the classic problems where interval manipulation skills get tested.
- cppreference: std::set and ordered containers describes the underlying ordered BST that C++ interval tree implementations build on.