Mo's Algorithm: Process Any Offline Range Query in O(n√n)

You have an array and a hundred thousand queries. Each query asks for the number of distinct elements in some subrange [L, R]. A segment tree gets you O(log n) per query for sums, minimums, maximums. But distinct count? No segment tree node can store that in a mergeable way. You're looking at O(n) per query naively, which is O(nQ) total: ten billion operations for n = Q = 10^5. That's a time limit exceeded in every language on the planet. The judge doesn't care that your code is "conceptually elegant."

Mo's algorithm solves this class of problem in O((n + Q)√n) by doing something that sounds almost too simple: it reorders the queries. No extra data structure. No lazy propagation. Just a specific sort order that makes a sliding window over the array nearly free to maintain.

The mental model: imagine dragging two pointers across an array. Every time you move a pointer one step, you call an add or remove hook that updates your answer in O(1). Mo's algorithm chooses which queries to process in which order so the total number of pointer steps is O(n√n) instead of O(nQ). You're not computing anything differently. You're just doing the same work in a smarter sequence.

The Core Idea: Reorder, Then Slide

The algorithm is offline. You receive all Q queries upfront, sort them into a clever order, and then answer them one by one by sliding a window [cur_l, cur_r] from the previous query's range to the current query's range.

The insight is that consecutive queries in Mo's order are always close together, so the window barely has to move.

If you processed queries in random order, the total pointer movement could be O(nQ). Mo's order cuts it to O(n√n). You're not doing less work per query. You're just picking an order where the total work collapses. It feels like cheating. It is not cheating.

Block Decomposition

Split the array indices [0, n-1] into blocks of size B (we'll derive the optimal B shortly). Block k contains indices [k*B, (k+1)*B - 1].

Every query (L, R) belongs to the block containing its left endpoint: block(L) = L / B.

Array split into four color-coded blocks of size B = √n, showing which indices belong to each block

Block number = L / B. All left endpoints in the same block are at most B apart.

The Sort Order

Sort queries by (block(L), R). Queries in the same block are sorted by R. The refinement that halves constant factors: alternate the direction of R for consecutive blocks. Even-numbered blocks sort R ascending; odd-numbered blocks sort R descending. This "zigzag" eliminates the O(n) reset cost at every block boundary.

Block 0 (even): sort R ascending   →  0 ... n
Block 1 (odd):  sort R descending  →  n ... 0
Block 2 (even): sort R ascending   →  0 ... n
...

Without zigzag, when you finish block 0 with R near n and block 1 wants R near 0, you pay O(n) to schlep the right pointer all the way back. With zigzag, block 1 starts from the end and works backwards, so R is already where it needs to be. Free movement. Every competitive programmer who skips the zigzag loses a factor of two and wonders why their solution is slow.

Four rows showing zigzag R traversal: even blocks go 0→n (right arrow), odd blocks go n→0 (left arrow), with explanation of why no reset cost occurs at boundaries

The zigzag cuts the constant factor roughly in half by eliminating the backtrack at every block boundary.

The Sliding Window Invariant

At all times, you maintain:

cur_l, cur_r: the current window boundaries
freq[]: a frequency map of elements in arr[cur_l..cur_r]
Whatever aggregate you care about (distinct count, sum of squares, etc.)

To move from the current window to query (l, r), you call add or remove for each index the window crosses:

while cur_r < r:  cur_r += 1; add(cur_r)
while cur_l > l:  cur_l -= 1; add(cur_l)
while cur_r > r:  remove(cur_r); cur_r -= 1
while cur_l < l:  remove(cur_l); cur_l += 1

Always expand before contracting. This order prevents the window from becoming invalid (where cur_l > cur_r + 1) during adjustment.

Two-panel diagram showing a window at [2,8] expanding right to include indices 9-12, then shrinking left from index 2-3, to reach target window [4,12]

Expand right first, then left, then shrink. The window never becomes temporarily empty or inverted.

Why O(n√n) Holds: The Proof

Let B be the block size. We have ceil(n/B) blocks.

Right Pointer Movement

Within each block, queries are sorted monotonically by R (ascending or descending, thanks to zigzag). So R never backtracks within a block. It traverses at most n positions per block.

Total right pointer movement = (number of blocks) × n = (n/B) × n = n²/B.

Left Pointer Movement

All L values within the same block fall in [k*B, (k+1)*B - 1]. So consecutive queries in the same block have |L_next - L_prev| ≤ B.

Total left pointer movement = (number of queries) × B = Q × B.

The Right Block Size

Total work = O(n²/B + Q×B). Take the derivative with respect to B and set to zero:

d/dB (n²/B + Q*B) = -n²/B² + Q = 0
B = n / √Q

When Q ≈ n, this gives B = √n, and the total is O(n√n + n√n) = O(n√n).

When Q is much smaller than n, use B = n/√Q to get O(n√Q) instead of the naive O(nQ).

In practice, competitive programmers often hardcode B = sqrt(n) or tune it empirically. Block sizes of 700 or 800 may beat exactly 750 due to cache effects.

Side-by-side boxes: left shows right pointer math (n/B blocks × n each = n²/B), right shows left pointer math (Q queries × B each = Q·B), with the minimisation formula B=n/√Q at the bottom

Both terms equal O(n√n) at B = √n. The derivation is just setting two functions equal and solving.

Complexity at a Glance

Operation	Time	Space
Sort queries	O(Q log Q)	O(Q)
Total pointer moves (right)	O(n²/B) = O(n√n)	O(1)
Total pointer moves (left)	O(Q·B) = O(Q√n)	O(1)
Overall (Q ≈ n)	O(n√n)	O(n + Q)
Per `add`/`remove` call	O(1) amortized	O(n) for freq map
Single query, online	O(n)	O(1) extra
Segment tree for composable queries	O(Q log n)	O(n)

If your add/remove hooks are O(1), Mo's whole framework costs O(n√n), no matter how complex the aggregate is. That's the power of the approach. Any segment tree would need the aggregate to be composable, which distinct count is not.

For the "Mo's algorithm with updates" variant, you add a third sort key (time of update), block size becomes n^(2/3), and complexity rises to O(n^(5/3)). That's a different beast, covered briefly later.

Is It Actually Correct?

Yes, and the proof is almost disappointingly short. Since all queries are on a static array and we process them offline, reordering them doesn't affect the answer for any individual query. The array doesn't change. The queries don't interact. The only requirement is that after processing query (l, r), the invariant cur_l == l and cur_r == r holds, and the frequency map reflects exactly arr[cur_l..cur_r]. The pointer movement code guarantees both. That's it.

One Structure, Every Language

The problem: given an integer array and Q queries of the form [L, R], for each query return the number of distinct elements in that subrange.

Before Mo's algorithm, this is what the approach looks like.

Drake meme: refusing O(nlogn) with 10 lines of code, approving O(n!) with 9 lines of code

"One fewer line of code" is a bold reason to take on factorial time complexity. Bold.

import sys
from math import isqrt

def mo_distinct(arr: list[int], queries: list[tuple[int, int]]) -> list[int]:
    n = len(arr)
    B = max(1, isqrt(n))

    indexed = [(l, r, i) for i, (l, r) in enumerate(queries)]
    indexed.sort(key=lambda x: (x[0] // B, x[1] if (x[0] // B) % 2 == 0 else -x[1]))

    freq: dict[int, int] = {}
    distinct = 0
    ans = [0] * len(queries)
    cur_l, cur_r = 0, -1

    def add(pos: int) -> None:
        nonlocal distinct
        v = arr[pos]
        freq[v] = freq.get(v, 0) + 1
        if freq[v] == 1:
            distinct += 1

    def remove(pos: int) -> None:
        nonlocal distinct
        v = arr[pos]
        freq[v] -= 1
        if freq[v] == 0:
            distinct -= 1

    for l, r, qi in indexed:
        while cur_r < r: cur_r += 1; add(cur_r)
        while cur_l > l: cur_l -= 1; add(cur_l)
        while cur_r > r: remove(cur_r); cur_r -= 1
        while cur_l < l: remove(cur_l); cur_l += 1
        ans[qi] = distinct

    return ans

What Mo's Can Actually Solve

Mo's algorithm handles any problem that fits this shape: you have a static array, all queries are known upfront, and you can maintain your answer as you extend or shrink the current window by exactly one element.

That last condition is the load-bearing one. If add(pos) and remove(pos) are O(1) (or O(log n) at worst), Mo's algorithm turns the whole batch into O(n√n) work.

Problems where Mo's is the right tool:

Non-composable aggregates over ranges. Distinct element count is the canonical example. Sum of squares of frequencies, the mode (most frequent element), the number of pairs with equal values: none of these can be merged across segment tree nodes in O(1). Mo's handles all of them by maintaining the aggregate incrementally.

When no segment tree, BIT, or sparse table fits. Some range statistics simply have no known efficient data structure. If you can maintain the answer under single-element additions and deletions, Mo's is often the path forward.

Frequency distribution problems. Any problem where the answer depends on the full frequency histogram of the current window: how many distinct frequencies exist, the maximum frequency, the sum of freq[v]^2 over all values (which equals the number of ordered pairs with equal values).

XOR, AND, OR over ranges. When updates to the aggregate from a single element are cheap (just XOR the running answer with the new element when adding, XOR again to remove), Mo's works directly.

The "Mo's with Updates" Variant

When the problem allows point updates in addition to range queries, the standard Mo's algorithm breaks because the array is no longer static. The fix adds a third dimension to the sort: the time (index of the most recent update applied to the current state). Block size becomes n^(2/3). Complexity rises to O(n^(5/3)).

The trade-off: O(n√n) is roughly 3×10^7 for n=10^5, comfortably fast. O(n^(5/3)) is roughly 2×10^8 for n=10^5, which may or may not pass depending on the constant factor.

Mo's on Trees

Tree path queries can also be processed with Mo's by flattening the tree using an Euler tour and tracking entry/exit times. A node is "included" if it appears an odd number of times in the current window. This lets you answer path aggregate queries in O(n√n) for problems that would otherwise require heavy-path decomposition or other complex tree data structures.

How to Spot When Mo's Algorithm Applies

Five signals that point to Mo's algorithm:

All queries arrive before you start answering them. The problem gives you the full list of queries as input. There is no online requirement, no "the next query depends on the previous answer."
The array does not change. No update operations during the query phase.
The aggregate is hard to merge. A segment tree or BIT would need O(n) to merge two nodes' data, or the property simply isn't composable at all.
You can maintain the answer under one-element changes. Adding or removing a single element from one end of the current window updates the answer in O(1) or O(log n).
The naive approach is O(n) per query. A brute-force scan of [L, R] gives the correct answer but is too slow. You need something smarter without building a full-blown index.

Worked Example 1: SPOJ DQUERY (Count Distinct)

Problem statement: Given an array of n integers and Q queries, each asking for the number of distinct elements in subarray [L, R].

Why it fits:

All queries known upfront: yes, they're given in input.
Static array: yes, no updates.
Hard to compose: distinct count of [L, M] and [M+1, R] doesn't give you distinct count of [L, R] without knowing which elements appear in both halves.
One-element maintenance: when you add element v to the window, if freq[v] goes from 0 to 1, increment distinct. When you remove v, if freq[v] goes from 1 to 0, decrement distinct. Both are O(1).
Naive approach: iterate [L, R], use a set, O(n) per query.

Mo's algorithm gives O(n√n) for this problem. For n = Q = 10^5, that's about 3.16×10^7 operations. The naive approach gives 10^10. A 300,000x speedup from one sort() call. Tell your friends.

The implementation is exactly the code shown in the language section above.

Worked Example 2: Count Pairs with Equal Values in [L, R]

Problem statement: Given an array and Q queries, each asking for the number of ordered pairs (i, j) with L ≤ i < j ≤ R such that arr[i] == arr[j].

Why it fits: Same first four signals apply. The clever part is the one-element maintenance.

When you add element v to a window that already contains freq[v] copies, the new element forms a pair with each existing copy. Increment the pair count by freq[v], then increment freq[v].

When you remove element v, decrement freq[v] first, then decrement the pair count by the new freq[v] (the remaining copies it was paired with).

pairs = 0
freq = {}

def add(pos):
    nonlocal pairs
    v = arr[pos]
    pairs += freq.get(v, 0)      # new pairs formed
    freq[v] = freq.get(v, 0) + 1

def remove(pos):
    nonlocal pairs
    v = arr[pos]
    freq[v] -= 1
    pairs -= freq[v]             # pairs broken

Both add and remove are O(1). Mo's algorithm processes the whole batch in O(n√n). No segment tree can answer this offline in less than O(n log n) preprocessing and O(√n) per query.

The pattern match here is the key skill. You recognized that the aggregate (pair count) changes by a predictable O(1) amount per window extension, even though computing the aggregate from scratch takes O(n) time.

The Pitfalls Nobody Warns You About

Coordinate compression before the frequency array. If array values can be up to 10^9 but there are only n = 10^5 distinct values, you can't allocate a frequency array of size 10^9. That would be around 4GB. Your laptop might protest. Compress coordinates down to [0, n-1] before running Mo's. A HashMap avoids this at the cost of a larger constant factor.

The zigzag sort matters for performance. The naive sort (ascending R within each block) causes O(n) wasted movement at every block boundary as R resets. The zigzag (alternating direction) eliminates this. The difference is roughly 2x in practice for large inputs. Always use the zigzag.

Block size is not always exactly sqrt(n). The optimal block size when Q is much smaller than n is n / sqrt(Q). For Q = 1000 and n = 10^5, sqrt(n) = 316 but n / sqrt(Q) = 3162, nearly 10x larger. Using sqrt(n) here wastes work on unnecessary left pointer movement. If Q << n, tune the block size.

The expansion-before-contraction order is not optional. If you shrink before expanding, you may temporarily have cur_l > cur_r, which means the window is empty or inverted and the freq table becomes inconsistent. Always expand right, expand left, shrink right, shrink left. Pair-wise grouping works; mixing expand and shrink arbitrarily does not.

Mo's algorithm is offline. Period. If the problem requires answering queries in order (because later queries depend on earlier answers), or if there are interleaved update operations in the same sequence, the standard Mo's algorithm does not apply. "Mo's with updates" handles point updates, but it adds significant complexity and degrades to O(n^(5/3)). Different algorithm. Different proof. Different headache.

Everything That Matters (Summary)

Mo's algorithm is an offline range query technique based on sorting queries into a block order that minimizes total pointer movement.
Block size B = sqrt(n) gives O(n√n) total work when Q ≈ n. Use B = n/sqrt(Q) when Q << n.
Right pointer movement is O(n²/B) = O(n√n). Left pointer movement is O(Q·B) = O(Q√n).
The zigzag sort (alternating R direction across consecutive blocks) halves the constant.
The algorithm applies when: queries are offline, the array is static, and the answer can be maintained under O(1) single-element additions and deletions.
The add/remove hooks are what you customize per problem. The rest of the framework is always the same.
Always expand before contracting to keep the window valid.
Coordinate-compress if array values are large but sparse.
Mo's with updates: block size n^(2/3), complexity O(n^(5/3)), adds a third sort dimension.

Mo's algorithm rewards a specific kind of thinking: "I can't query this efficiently, but can I maintain it incrementally?" If the answer is yes and your queries are offline, you have a path to O(n√n). Train yourself to ask that question and it starts showing up in problems that first looked impossible. Sort the queries. Slide the window. Collect your AC.

SpaceComplexity runs voice-based DSA mock interviews with rubric-based feedback. If you want to practice explaining the block size derivation or the pointer movement proof out loud under pressure, that's exactly the kind of depth it tests.

For range query problems where the aggregate is composable, the segment tree is usually the right tool. For the full picture on why O(1) hash map operations underlie the efficiency of Mo's frequency map, see hash map time complexity. The block size derivation is an instance of the same min-of-two-terms argument that shows up throughout algorithm analysis; big-O notation covers that pattern in detail.