Segment Tree vs Fenwick Tree: One Question Decides It

Both run O(log n) for range queries and point updates. So the choice feels like personal taste, or maybe a coin flip, or maybe whichever one you learned first and never questioned again.

The real decision is whether your operation has an inverse. That single property determines which structure you can actually use, and no amount of clever implementation changes it. Everything else is commentary.

What Problem Both Structures Solve

You have an array. You want to answer aggregate range queries ("what is the sum of elements 3 through 11?") and update individual elements, both faster than O(n). The naive approach scans the whole range on every query. Both data structures preprocess the array into a tree so each operation costs O(log n).

That is where the similarity ends.

Side-by-side overview: Fenwick tree as a flat array with coverage bars, segment tree as an explicit binary tree Fenwick tree: each index covers a range determined by its lowest set bit. Segment tree: an explicit binary tree where each node stores the aggregate of its subtree.

The Fenwick Tree: A Flat Array With a Hidden Structure

Peter Fenwick published the Binary Indexed Tree in 1994. If you saw it cold, you would think it was a normal array with a slight twitch. It is not.

Each index i covers a range of length i & (-i), which isolates the lowest set bit of i. Index 6 in binary is 110. Its lowest set bit is 10 = 2, so tree[6] stores the aggregate of exactly 2 elements: indices 5 and 6. Index 8 is 1000, lowest set bit is 1000 = 8, so tree[8] covers all 8 elements from 1 to 8. Index 8 is doing the work of eight interns.

To compute a prefix aggregate up to index i, strip the lowest set bit repeatedly: i -= i & (-i). To propagate an update at index i, add the lowest set bit repeatedly: i += i & (-i). Both operations visit at most O(log n) nodes, and the entire traversal stays within a flat array with no pointer chasing.

class BIT:
    def __init__(self, n):
        self.tree = [0] * (n + 1)  # 1-indexed
        self.n = n

    def update(self, i, delta):
        while i <= self.n:
            self.tree[i] += delta
            i += i & (-i)

    def prefix(self, i):
        s = 0
        while i > 0:
            s += self.tree[i]
            i -= i & (-i)
        return s

    def range_sum(self, l, r):
        return self.prefix(r) - self.prefix(l - 1)

range_sum(l, r) works because sum(l..r) = prefix(r) - prefix(l-1). That subtraction is the whole ballgame, and also the source of the Fenwick tree's one fatal limitation. We will get there.

Fenwick tree traversal: update at i=6 walks up by adding LSB (blue), query prefix(6) walks down by stripping LSB (amber) Update walks right (6 → 8, done). Query walks left (6 → 4 → 0, accumulate). Two directions, one bit trick, zero pointers.

The Segment Tree: Any Merge, Any Range

The segment tree builds an explicit binary tree over the array. Each internal node stores the aggregate of its two children. The root covers the full array. Leaves are individual elements. It is a very normal binary tree that happens to be very good at its job.

The iterative layout stores everything in a flat array of size 2n. Leaves live at indices n through 2n-1. The parent of node i is i >> 1. The children of node i are 2i and 2i+1. No recursion required. No pointers. Just arithmetic on indices, which is exactly what CPUs are good at.

class SegTree:
    def __init__(self, nums, merge=lambda a, b: a + b, identity=0):
        self.n = len(nums)
        self.merge = merge
        self.identity = identity
        self.tree = [identity] * (2 * self.n)
        for i, v in enumerate(nums):
            self.tree[self.n + i] = v
        for i in range(self.n - 1, 0, -1):
            self.tree[i] = merge(self.tree[2*i], self.tree[2*i+1])

    def update(self, i, val):  # 0-indexed point update
        i += self.n
        self.tree[i] = val
        while i > 1:
            i >>= 1
            self.tree[i] = self.merge(self.tree[2*i], self.tree[2*i+1])

    def query(self, l, r):  # [l, r] inclusive, 0-indexed
        res = self.identity
        l += self.n
        r += self.n + 1
        while l < r:
            if l & 1:
                res = self.merge(res, self.tree[l])
                l += 1
            if r & 1:
                r -= 1
                res = self.merge(res, self.tree[r])
            l >>= 1
            r >>= 1
        return res

Pass merge=min, identity=float('inf') and you have a range-min tree. Pass merge=gcd and you get range-GCD. The structure does not care what the operation is, as long as it is associative and has an identity element. It will happily merge whatever you throw at it.

Iterative segment tree for n=8 with range query [2,5]: blue nodes collected on the left boundary walk, amber nodes collected on the right boundary walk Query [2,5] collects node 5 ([2..3]) from the left boundary and node 6 ([4..5]) from the right boundary. Four log(n) steps, two useful results.

The Invertibility Wall

Here is where the Fenwick tree hits a brick wall and the segment tree walks right through it.

To compute aggregate(l..r), the Fenwick tree computes prefix(r) INVERSE prefix(l-1). For addition, the inverse is subtraction. For XOR, the operation is its own inverse. But minimum has no inverse. Knowing min(1..r) = 3 and min(1..l-1) = 5 tells you nothing about min(l..r). The value 3 might come from before position l, or from within it. You cannot recover the range answer from two prefix answers because information has been irrecoverably crushed.

Same story for maximum, GCD over arbitrary integers, bitwise AND, and bitwise OR. All of them one-way streets.

This is not a workaround situation. It is a structural impossibility. You cannot subtract your way out of it. The decision collapses to one question: does your operation have an inverse over your value type? If no, use a segment tree. Full stop. Go home.

Range Updates: When Fenwick Catches Up

Suppose you want to add a value v to every element in range [l, r], then query the range sum. This is "range update, range query," and a naive Fenwick tree cannot do it. But two parallel Fenwick trees can, through a surprisingly elegant trick.

Define BITs B1 and B2. To add v to [l, r]:

B1.update(l, v);    B1.update(r+1, -v)
B2.update(l, v*(l-1)); B2.update(r+1, -v*r)

The prefix sum up to index i is then B1.prefix(i) * i - B2.prefix(i).

This works because a range update on a difference array becomes a point update on B1, and the offset correction lives in B2. The two-BIT technique is the Fenwick tree's answer to lazy propagation, but it only works for additive operations. Segment trees handle range updates via lazy propagation, which generalizes to any operation where you can compose pending updates, including min and max assignment. If your operation is not additive, you need the segment tree.

The K-th Element: A Fenwick Trick Worth Knowing

If your Fenwick tree stores frequencies, you can find the k-th smallest element in O(log n) using binary descent on the implicit tree structure, without a separate binary search loop:

def kth(tree, k, n):
    pos = 0
    for i in range(n.bit_length(), -1, -1):
        nxt = pos + (1 << i)
        if nxt <= n and tree[nxt] < k:
            k -= tree[nxt]
            pos = nxt
    return pos + 1  # 1-indexed result

At each step you ask whether jumping right covers enough elements. If yes, jump and subtract. If no, stay put. This mirrors a segment tree walk on the same implicit structure, but runs entirely in the flat array.

Binary descent for k=5 on an 8-element frequency BIT: steps i=3 and i=2 are skipped, i=1 and i=0 produce jumps that land at rank 4 Four steps, two jumps. The answer falls out without a separate binary search. The segment tree version is more natural to write, but this one stays hot in cache.

The segment tree version is more natural to write. The Fenwick version is faster because it stays in cache and avoids pointer chasing. Both O(log n).

Performance Reality

For pure prefix-sum workloads around n = 500,000, Fenwick trees run about 20 to 30 percent faster than iterative segment trees in practice. The constant factor matters when you are doing 10^7 operations under a tight time limit.

Why the gap? The Fenwick tree is a flat array of size n+1. The iterative segment tree is a flat array of size 2n. Both fit in L3 cache for moderate n. But the Fenwick tree's access pattern, while not strictly sequential, has excellent spatial locality. Parent nodes live at lower indices, so traversals tend to hit recently accessed cache lines. No drama. Just fast.

Segment trees have a known aliasing issue at around n = 2^19. Nodes at indices 2^18 and 2^19 land in the same L3 cache set on some architectures, causing cache thrashing. Fenwick trees have a similar issue at the same boundary. For n below 500,000, both are comfortably in cache and the gap mostly disappears.

Memory: Fenwick uses n+1 slots. Iterative segment tree uses 2n slots. At n = 10^7 with 32-bit integers, that is 40 MB versus 80 MB. On memory-constrained problems, this matters more than you think.

Segment Tree vs Fenwick Tree: When to Use Which

What you need	Fenwick	Segment Tree
Point update, range sum or XOR	Yes	Yes
Point update, range min or max	No	Yes
Range update (+delta), range sum	Yes (two-BIT)	Yes (lazy)
Range update, range min or max	No	Yes (lazy)
K-th element (frequency array)	Yes (descent)	Yes
Any associative operation	No	Yes
Code length	Shorter	Longer
Memory	n+1	2n iterative
Constant factor	Lower	Slightly higher

Worked Example: Count Smaller Numbers to the Right

LeetCode 315 asks: for each element, how many elements to its right are strictly smaller?

Process right to left. Coordinate-compress the values to ranks in [1, m]. Insert each element's rank into a Fenwick tree tracking frequencies, then query how many elements smaller than the current value have already been inserted.

def countSmaller(nums):
    sorted_unique = sorted(set(nums))
    rank = {v: i + 1 for i, v in enumerate(sorted_unique)}
    m = len(sorted_unique)
    bit = BIT(m)
    result = []
    for x in reversed(nums):
        r = rank[x]
        result.append(bit.prefix(r - 1))  # count of ranks < r
        bit.update(r, 1)
    return result[::-1]

A segment tree works here too. But the operation is additive and you only need prefix sums, so the Fenwick tree is the right tool. It is shorter, faster, and the invertibility constraint is satisfied. No reason to pull out the bigger structure.

If the problem instead asked for the minimum element to the right of each index, you would switch to a segment tree with merge=min and a point-update, range-query pattern. The Fenwick tree cannot answer that query. Not "it is harder." Cannot.

Recap

Both structures run in O(log n) for point updates and range queries.
Fenwick trees require an invertible operation. Minimum, maximum, and GCD have no inverses.
Segment trees accept any associative operation with an identity element.
For additive workloads, Fenwick is shorter, uses less memory, and runs about 20 to 30 percent faster.
The two-BIT technique extends Fenwick to range-update, range-sum, but only for additive operations.
Binary descent on a Fenwick tree finds the k-th element in O(log n) without a separate binary search loop.
When the operation is additive and range-min is not needed: Fenwick. Anything else: segment tree.

If you are drilling these for interviews, SpaceComplexity will ask you to explain the invertibility constraint out loud, under time pressure, which is exactly where candidates go quiet and lose points.