What Is a Segment Tree? The Data Structure Behind Range Queries

You have an array. It is just sitting there, minding its business. Someone asks you the sum of elements from index 2 to 7. Easy. You write a prefix sum, pat yourself on the back, and feel like a real programmer.

Then they update index 4.

Prefix sums answer range queries in O(1), but a single mutation invalidates the entire precomputed array. A segment tree solves both sides at once: O(log n) queries and O(log n) updates, no sacrifices, no crying.

Two Naive Approaches, Two Ways to Suffer

A prefix sum array is genuinely great. O(1) range queries, built in linear time, elegant. The catch: when one element changes, you have to rebuild every prefix sum from that index forward. That is O(n) per update. If your array never changes, prefix sums are the correct call and you should stop reading and go home. Add any mutation and the costs add up embarrassingly fast.

The other option is just... not preprocessing anything. Iterate over the range for every query. O(n) per query. Works fine for a small number of queries. Does not work fine when a LeetCode problem issues ten thousand of them and you start doing math on how long O(n * q) actually takes.

The segment tree trades a constant factor for balanced guarantees. Both queries and updates run in O(log n), which is the answer whenever mutations and range aggregates have to coexist.

The Core Insight: Divide the Range

A segment tree is a binary tree where each node stores an aggregate value for a contiguous range of the array. The root covers the entire array. Its left child covers the left half, right child the right half. Keep splitting until each leaf represents a single element.

That is the whole idea. To answer "what is the sum of indices 3 through 9?", you do not scan each element. You stitch together precomputed answers from a small handful of nodes whose ranges compose exactly what you need.

For an array of length n, the tree has height log n, and no query ever visits more than 4 log n nodes. That bound sounds handwavy until you trace a real example and realize it just works out that way at every level.

Tracing a Query on a Real Array

Take this array:

index: 0  1  2  3  4  5
value: 1  3  5  7  9  11

Build a sum segment tree. The root stores 36. Left child stores 9 (indices 0 to 2), right child stores 27 (indices 3 to 5):

Segment tree for array [1, 3, 5, 7, 9, 11] with query range [2, 4] highlighted in green

Query [2, 4]: two green nodes answer it completely. Four total nodes visited.

Query: sum of indices 2 to 4.

The algorithm descends from the root, comparing the query range [2, 4] against each node's range:

Root [0-5]: partially overlaps. Recurse into both children.
Left child [0-2]: partially overlaps. Recurse.
Left-left [0-1]: no overlap. Return 0.
Left-right [2-2]: fully contained. Return 5.
Right child [3-5]: partially overlaps. Recurse.
Right-left [3-4]: fully contained. Return 16.
Right-right [5-5]: no overlap. Return 0.

Result: 5 + 16 = 21. Four nodes visited instead of scanning all six elements. At n = 1,000,000, this difference is not academic.

How a Segment Tree Is Stored (Spoiler: No Pointers)

Here is the part that surprises people: you do not build an actual tree with node objects and pointers. A segment tree lives in a flat array, exactly like a binary heap.

Index the tree starting at 1. Root sits at index 1. For any node at index i, left child is at 2i and right child is at 2i+1. No pointers, no dynamic allocation, nothing fancy. The array needs at most 4n slots, which is a safe overestimate for when n is not a power of 2.

(Yes, 4n feels generous. It is fine. Move on.)

class SegmentTree:
    def __init__(self, nums: list[int]):
        self.n = len(nums)
        self.tree = [0] * (4 * self.n)
        if self.n > 0:
            self._build(nums, 1, 0, self.n - 1)

    def _build(self, nums: list[int], node: int, start: int, end: int) -> None:
        if start == end:
            self.tree[node] = nums[start]
            return
        mid = (start + end) // 2
        self._build(nums, 2 * node, start, mid)
        self._build(nums, 2 * node + 1, mid + 1, end)
        self.tree[node] = self.tree[2 * node] + self.tree[2 * node + 1]

    def query(self, left: int, right: int) -> int:
        return self._query(1, 0, self.n - 1, left, right)

    def _query(self, node: int, start: int, end: int, left: int, right: int) -> int:
        if right < start or end < left:
            return 0  # out of range: identity element for sum
        if left <= start and end <= right:
            return self.tree[node]  # fully contained: use stored value
        mid = (start + end) // 2
        return (
            self._query(2 * node, start, mid, left, right) +
            self._query(2 * node + 1, mid + 1, end, left, right)
        )

    def update(self, index: int, value: int) -> None:
        self._update(1, 0, self.n - 1, index, value)

    def _update(self, node: int, start: int, end: int, index: int, value: int) -> None:
        if start == end:
            self.tree[node] = value
            return
        mid = (start + end) // 2
        if index <= mid:
            self._update(2 * node, start, mid, index, value)
        else:
            self._update(2 * node + 1, mid + 1, end, index, value)
        self.tree[node] = self.tree[2 * node] + self.tree[2 * node + 1]

Build visits every node once. There are roughly 2n nodes, so build is O(n). The flat-array layout also gives good cache locality since parents and children sit close in memory, which is a nice bonus for something this recursive-looking.

Queries and Updates Both Run in O(log n)

The height of the tree is ceil(log2(n)). Every operation travels from root toward the leaves, and neither goes deeper than that.

Update is the simpler case. You descend one path from root to the target leaf, recomputing every ancestor on the way back up. Exactly log n nodes touched. Clean, predictable, no surprises.

Query is slightly more subtle. At each level, the query range can partially overlap at most two nodes: one on the left boundary, one on the right. Every other node at that level is either fully inside (return immediately) or fully outside (return 0). The total nodes visited across all levels is bounded by 4 log n. It takes a minute to convince yourself this is true, and then you just accept it and move on.

Space is O(n). The 4n array is a constant-factor overestimate that handles non-power-of-two sizes without drama.

Any Associative Operation Works

Sum is the textbook example, but the structure does not care. A segment tree works for any operation that is associative and has an identity element.

Minimum: nodes store the minimum in their range, identity is positive infinity. Maximum: identity is negative infinity. GCD: identity is 0. Bitwise AND: identity is all 1s. Product modulo a prime: identity is 1. Count of elements satisfying a property: identity is 0.

You only need to change what nodes store and what you return for out-of-range cases. The shape of build, query, and update is identical every time.

More advanced segment trees support range updates using lazy propagation. Instead of immediately pushing an update down to all affected leaves, you mark a node lazily and propagate downward only when a query forces you to. This keeps range updates in O(log n) as well. Lazy propagation is harder to implement and most interviews skip it, but knowing it exists is useful when someone asks "what if you need to increment every element in a range?"

The Interview Signal Is Actually Clear

Segment trees have a reputation for being one of those "you either know it or you don't" topics. The recognition trigger is not actually that subtle.

The signal: the problem has both range queries and point updates on a mutable array. Static array with no updates? Prefix sum, O(1), done. Point queries only, no ranges? Plain array. Mutations plus range aggregates? Segment tree territory.

LeetCode 307 (Range Sum Query - Mutable) is the canonical example. You will also see it disguised: "count elements satisfying some property in a range after updates," or any problem that mixes mutable data with aggregate range questions. Range minimum query shows up constantly in interval scheduling, sliding window variants, and problems involving nested ranges.

The tell is the word "mutable." Once you spot it, the rest follows.

Which Structure to Reach For

Structure	Build	Query	Update	When to use
Prefix sum	O(n)	O(1)	O(n)	Static arrays only
Segment tree	O(n)	O(log n)	O(log n)	Min, max, GCD, or range updates
Fenwick tree	O(n)	O(log n)	O(log n)	Sum queries with point updates

If you only need range sums with point updates, prefer the Fenwick tree. Same asymptotic complexity, significantly simpler code, smaller constant factor. The segment tree wins when you need a different aggregate function, when you want lazy propagation for range updates, or when you want one mental model that covers everything and you are tired of learning new things.

For a detailed side-by-side, see the segment tree vs Fenwick tree breakdown. If your array is completely static and you need range minimums, a sparse table gives you O(1) queries with O(n log n) build and zero update capability at all.

Talking About It While Someone Watches

Understanding the structure at a desk is one thing. Explaining your data structure choice out loud, in real time, while typing under a timer, is a different skill entirely.

SpaceComplexity runs voice-based mock interviews where you have to reason about tradeoffs out loud, not just implement. When an interviewer asks "why not a prefix sum?" or "could you use a different structure here?", you need the answer before your hands reach the keyboard. That conversational fluency is what separates candidates who understand segment trees from candidates who memorized the code.