Persistent Segment Tree: All Past Versions, O(log n) Per Query

Your standard segment tree has the memory of a goldfish. It knows what the array looks like right now. Ask it what the sum was three updates ago and it stares blankly. You'd have to replay every update from scratch. Nobody wants that.

Persistent segment trees fix this: every point update produces a new version, all versions survive intact, and any version answers a range query in O(log n). Your old tree still exists. You just have a new one too.

The trick is that you don't copy the entire tree on each update. You copy only the path from root to the changed leaf, O(log n) nodes, and point everything else at the previous version's nodes. An untouched subtree costs zero to "copy." It's the Git commit model applied to tree nodes, except it actually works in O(log n) instead of "however long your CI takes."

You reach for this structure when the problem asks about history. Find the k-th smallest element in A[l..r]. Query what the sum was after the k-th update. Undo any number of operations in O(log n) each. Or answer offline range queries that require knowing what the structure looked like at every past moment.

How Path Copying Works

A standard segment tree node holds three fields: an aggregated value (sum, min, max, or whatever your query needs), a left child pointer, and a right child pointer. A persistent segment tree uses exactly the same node shape. The difference is that nodes are never mutated. Once a node is written, it stays in memory forever, shared by every version that includes it.

The key fact is that updating any single leaf in a segment tree touches exactly one node per level of the tree, and nothing else. A tree over n elements has height ⌈log₂ n⌉. Every point update creates at most ⌈log₂ n⌉ + 1 new nodes: one for the leaf, one for each ancestor up to the root. For n = 10^6, that's 21 nodes. The other 1,999,979 stay exactly where they were.

Here is what happens step by step when you update position 1 in a four-element array.

Path copying: version 0 on the left, version 1 on the right. Three new nodes in blue, four shared nodes reused by pointer.

Update index 1 from 3 to 9. Three nodes are allocated. Four are shared. That's O(log n) work.

Version 1's root is new. Version 1's left child is new. Version 1's leaf at index 1 is new. Everything else, the right subtree [2-3: 12] and the leaf [0:1], is borrowed directly from version 0. No copies. Just pointers.

You store versions in an array of roots. Each root is a tiny new node; most of the tree underneath it is shared with older versions.

roots[0] ──► node_A  (covers [0-3], val=16)
roots[1] ──► node_E  (covers [0-3], val=22)
               │
               ├── node_F  (covers [0-1], val=10)
               │     ├── node_B  (covers [0], val=1)  ← same object as in version 0
               │     └── node_G  (covers [1], val=9)  ← new
               │
               └── node_C  (covers [2-3], val=12)     ← same object as in version 0
                     ├── node_D_left   …
                     └── node_D_right  …

Version roots array: roots[0] through roots[k] each point to a version root. Multiple roots share unchanged subtrees via pointers. O(n + q log n) total nodes.

Each update costs O(log n) new nodes. Query any version in O(log n) by passing its root.

This is called path copying, one of two classical techniques for making data structures persistent. (The other, "fat nodes," stores modification histories inside each node. Path copying is the standard choice for segment trees because the access cost stays O(log n) instead of O(log n · log q) for fat nodes.)

The technique was formalized by Driscoll, Sarnak, Sleator, and Tarjan in their 1986 paper "Making Data Structures Persistent" (Journal of Computer and System Sciences, 38(1):86-124), which manages to be both the most descriptively titled CS paper and the foundation of half the competitive programming leaderboard. They identified three flavors of persistence: partial (only the current version can be updated), full (any version can be updated, branching the version history), and confluent (versions can be merged). Persistent segment trees as used in practice are fully persistent, since you can branch off from any old root.

Core Operations and Why Each Complexity Holds

Operation	Time	Auxiliary Space
Build from array	O(n)	O(n)
Point update, create new version	O(log n)	O(log n)
Range query on any version	O(log n)	O(1)
Total space after q updates	N/A	O(n + q log n)

Build is O(n)

The initial build visits every node in the tree exactly once, bottom-up. A segment tree over n elements has exactly 2n − 1 nodes: n leaves and n − 1 internal nodes. Each node takes O(1) to compute (sum of children). Total: O(n).

Each update is O(log n) time and creates O(log n) new nodes

The path from root to any leaf passes through exactly ⌈log₂ n⌉ + 1 nodes. Every node on that path needs a new copy because its aggregate value changes. Every node off the path is identical to the previous version and is reused by pointer. So the update allocates exactly ⌈log₂ n⌉ + 1 nodes and does O(1) work per node. Both time and extra space are O(log n).

Range query is O(log n) with O(1) extra space

Querying a persistent version is identical to querying a standard segment tree. You pass the appropriate root and recurse, splitting or returning the aggregate at each node. Nothing about the shared-node structure affects correctness because no node is ever mutated after creation. Extra space is O(1). The call stack is O(log n), but that's true of any segment tree query.

Total space after q updates

Initial build allocates 2n − 1 nodes. Each of q updates allocates at most ⌈log₂ n⌉ + 1 new nodes. Total: 2n − 1 + q · (⌈log₂ n⌉ + 1) = O(n + q log n). For competitive programming problems where q = O(n), this is O(n log n) total nodes.

When pre-allocating a flat array (the index-based approach), the safe upper bound is 2 * n + q * (ceil(log2(n)) + 2). Most practitioners round up to 2 * n + q * 40 when n ≤ 10^6.

The Version-Difference Trick for Range Order Statistics

This is where persistent segment trees stop being a clever memory trick and start being a little absurd. The problem: given an array A of n integers, answer q queries of the form "what is the k-th smallest element in A[l..r]?"

Naively, you'd sort each subarray. O(n²) space and a sad look from your interviewer. With a merge sort tree you get O(log³ n) per query, which is fine until n hits 10^5 and the constant factor punches you. With a persistent segment tree you get O(log n) per query.

The key is to build the tree not over indices but over values. After coordinate compression (mapping n values to [0, n−1]), version[i] is a frequency segment tree that records "how many elements in A[1..i] have each compressed value." Version[0] is all zeros.

A = [3, 1, 2, 5, 4]  →  compressed = [2, 0, 1, 4, 3]

version[0]: all zeros
version[1]: freq[2] += 1  (inserted 3)
version[2]: freq[0] += 1  (inserted 1)
version[3]: freq[1] += 1  (inserted 2)
version[4]: freq[4] += 1  (inserted 5)
version[5]: freq[3] += 1  (inserted 4)

To answer "k-th smallest in A[l..r]": descend version[r] and version[l−1] simultaneously. At each node, compute left_count = version[r].left.cnt − version[l-1].left.cnt. That is the number of elements in A[l..r] that fall in the left half of the value range. If k ≤ left_count, go left. Otherwise, subtract left_count from k and go right. At the leaf, return the decompressed value.

Query: k-th smallest in A[2..4] (1-indexed), k = 2
       version[4].left.cnt  = 3   (elements ≤ mid in A[1..4])
       version[1].left.cnt  = 0   (elements ≤ mid in A[1..1])
       left_count           = 3

  If k=2 ≤ 3: recurse left, looking for 2nd smallest
  … and so on down to the leaf …

This technique is sometimes called the "Chairman Tree" (主席树) in Chinese competitive programming, a name that conveys maximum gravitas while explaining nothing about what it does. No matter. You never build the difference tree explicitly: you walk the two versions in lockstep and subtract at each node.

Chairman Tree: version[l-1] on the left, version[r] on the right. Both trees descended simultaneously. At each node, subtract left-child counts to find how many elements in A[l..r] fall in the left half of the value range, then decide which way to recurse.

Walk both trees at once. Subtract counts at every level. No new nodes allocated. O(log n) total.

One Structure, Every Language

from __future__ import annotations
from dataclasses import dataclass, field
from typing import Optional

@dataclass
class Node:
    val: int = 0
    left: Optional[Node] = field(default=None, repr=False)
    right: Optional[Node] = field(default=None, repr=False)

def build(arr: list[int], lo: int, hi: int) -> Node:
    if lo == hi:
        return Node(val=arr[lo])
    mid = (lo + hi) // 2
    left = build(arr, lo, mid)
    right = build(arr, mid + 1, hi)
    return Node(val=left.val + right.val, left=left, right=right)

def update(prev: Node, lo: int, hi: int, pos: int, new_val: int) -> Node:
    if lo == hi:
        return Node(val=new_val)
    mid = (lo + hi) // 2
    if pos <= mid:
        new_left = update(prev.left, lo, mid, pos, new_val)
        return Node(val=new_left.val + prev.right.val, left=new_left, right=prev.right)
    new_right = update(prev.right, mid + 1, hi, pos, new_val)
    return Node(val=prev.left.val + new_right.val, left=prev.left, right=new_right)

def query(node: Node, lo: int, hi: int, ql: int, qr: int) -> int:
    if node is None or ql > hi or qr < lo:
        return 0
    if ql <= lo and hi <= qr:
        return node.val
    mid = (lo + hi) // 2
    return query(node.left, lo, mid, ql, qr) + query(node.right, mid + 1, hi, ql, qr)

# --- demo ---
arr = [1, 3, 5, 7, 9]
n = len(arr)
roots = [None] * (n + 1)
roots[0] = build(arr, 0, n - 1)
roots[1] = update(roots[0], 0, n - 1, 2, 99)   # set index 2 to 99

print(query(roots[0], 0, n - 1, 0, 4))  # 25
print(query(roots[1], 0, n - 1, 0, 4))  # 119

What Problems Does a Persistent Segment Tree Solve?

Range order statistics. The k-th smallest element in A[l..r] is the canonical application. Merge sort trees solve this in O(log³ n). Persistent segment trees bring it to O(log n). This shows up in problems involving ranked queries over arbitrary subarrays.

Historical state queries. The problem gives you an array and a sequence of point updates. Queries ask for the state of some range after the k-th update. A standard segment tree needs to process updates in order. A persistent one lets you jump to any version instantly.

Undo / rollback. Because you store roots, rolling back to any past state means switching which root you pass to query. No reconstruction. No reverse operations. Just roots[k].

Offline 2D range problems via scan-line. Some problems can be reduced to: sweep over a sorted coordinate, build one tree version per step, query version ranges. This is how you handle "sum of elements in rectangle with time constraints" without a 2D segment tree.

Persistent ordered sets on trees. For path queries on trees (e.g., k-th smallest on the path from u to v), combine Heavy-Light Decomposition with persistent segment trees. Each path segment becomes a version range, and queries walk O(log n) chains each with a O(log n) query.

Five Signals This Is the Right Tool

Signal 1: "k-th smallest in A[l..r]"

The word "k-th" combined with a subarray range almost always signals a persistent segment tree or a merge sort tree. Ask: does the problem allow offline processing? If yes, a merge sort tree at O(log³ n) might be fast enough. If the problem has n and q both around 10^5, you want O(log n) per query, which means persistent segment tree.

Signal 2: "After the k-th update, what is..."

Any time the problem mixes update-order with range queries (not just current state), persistence is the natural tool. With a plain segment tree, you'd have to replay all updates up to k. With a persistent one, you query roots[k] directly.

Signal 3: Large value range with frequency counting

If the problem asks about distributions over a large value range (like "how many elements in A[l..r] are between L and R"), the key observation is that frequency queries over a subarray are exactly what the Chairman Tree setup provides. Coordinate compress the values, build one version per prefix, query the difference.

Signal 4: "Any version" or "undo"

Explicit version language in the problem statement is a direct cue. Competitive programming problems sometimes say things like "restore the array to its state before the i-th query" or "find the answer at time t." These are solvable with an O(n log n) tree and O(1) version lookup.

Signal 5: You already need a segment tree, and the queries reference time

If you'd normally build a segment tree but the problem also asks time-indexed questions, consider adding persistence. The cost is an extra O(log n) factor in space and no change in time per operation.

Worked Example 1: K-th Smallest in Subarray (SPOJ MKTHNUM)

Problem: Given an array of n integers, answer q queries. Each query asks for the k-th smallest element in A[l..r].

Why persistent segment tree fits: A plain segment tree answers range sum queries. To find the k-th element, you'd need to binary search over the answer and count elements below it, taking O(n log n) per query. The persistent approach reframes the question: build a frequency tree over values (after coordinate compression), one version per prefix, then use the version-difference trick to answer each query in O(log n).

Setup:

Coordinate compress n values to indices [0, n−1].
Build version[0] = empty tree.
For i from 1 to n: version[i] = insert(version[i−1], compressed[i]) with a +1 frequency update.
For query (l, r, k): walk version[r] and version[l−1] together. At each node, left_cnt = version[r].left.cnt - version[l-1].left.cnt. If k <= left_cnt, recurse left; otherwise k -= left_cnt and recurse right.

Diagram: walking version[r] and version[l-1] simultaneously

version[l-1]   version[r]
      │               │
      ▼               ▼
  [0-3: 2]        [0-3: 5]
   /     \         /     \
[0-1:1] [2-3:1]  [0-1:3] [2-3:2]
                    ↑
        left_cnt = 3 - 1 = 2
        if k=2 ≤ 2: go left

Each query is O(log n). Total: O(n log n) build + O(q log n) queries.

Worked Example 2: Historical Range Sum Queries

Problem: Given an array, handle two operations: update(i, v) (set A[i] = v) and query(k, l, r) (return sum of A[l..r] as it was after the k-th update).

Why persistent segment tree fits: A standard segment tree handles current-state queries. Adding a "which version?" parameter means you just store a roots[] array. Each update creates a new version in O(log n). Each query addresses roots[k], also O(log n). Total space O(n + q log n).

The trap: some engineers try to answer version-k queries by replaying updates from scratch. That's O(k log n) per query, which is technically correct and practically a career choice you'll regret at q = 10^5. Persistence eliminates the replay entirely. You built the history as you went. Just use it.

Cheat Sheet

A persistent segment tree uses path copying: each update creates O(log n) new nodes, shares the rest.
Build is O(n) time and O(n) space. Each update is O(log n) time and O(log n) space. After q updates: O(n + q log n) total nodes.
Query any past version by passing its root: O(log n), identical to a standard segment tree query.
The Chairman Tree variant builds over values (not indices), one version per prefix, enabling k-th smallest in A[l..r] in O(log n).
The version-difference trick: walk two roots simultaneously, subtract counts at each node, never allocate new nodes.
Five pattern signals: k-th smallest in range, historical state queries, explicit undo, frequency distributions over large value ranges, segment tree + time indexing.
Pre-allocate 2*n + q*40 nodes when using the index-based flat-array approach.

Working through problems like this one under interview pressure, out loud, is a different skill from reading about them. SpaceComplexity runs voice-based mock interviews with rubric feedback across all four scored dimensions. If you want to practice explaining the version-difference trick to a real-time interviewer, that's what it's built for.

For more on structures that share this kind of logarithmic update pattern, see The Segment Tree: One Flat Array, Every Range Query, The Fenwick Tree: One Bit Trick to Rule Your Prefix Sums, and Binary Search: One Invariant to Rule the Search Space for the descent logic that underpins the k-th smallest walk.