Binary Exponentiation: Why Computing x^1000000000 Takes 30 Steps

Computing x^n the naive way takes n multiplications. Binary exponentiation takes O(log n). For n = 10^9, that's the difference between a billion operations and thirty. Also, in an interview, the difference between a working solution and a politely-worded "let's revisit the approach."

The mental model: every exponent written in binary tells you which powers of x to multiply together, and you build each successive power with one squaring. The result is an O(log n) algorithm that works on integers, floats, matrices, polynomials, or anything that has an associative multiplication and an identity element.

The algorithm was first documented in the Sanskrit treatise Chandah-sutra around 200 BC. It is 2,200 years old, predates Python by a comfortable margin, and is still running inside OpenSSL right now. Reach for it any time you see a large exponent, a "return the answer modulo 10^9+7" prompt, or a recurrence with n up to 10^18.

The Naive Approach Has an O(n) Problem

The obvious approach works:

result = 1
for _ in range(n):
    result *= x

It does exactly n multiplications. For n = 10^9 in Python, that runs for roughly ten seconds. In a coding interview, that's an eternity. In RSA, which exponentiates 2048-bit numbers to 2048-bit exponents, a naive loop is not just slow. It's physically impossible to finish before the heat death of the universe.

The number of multiplications is the bottleneck, not the speed of each one.

What the Binary Representation Reveals

Take n = 13. In binary:

13 = 1101₂ = 2³ + 2² + 2⁰ = 8 + 4 + 1

So x^13 = x^8 × x^4 × x^1.

And each of those powers is the square of the previous one:

x^1 = x
x^2 = (x^1)²
x^4 = (x^2)²
x^8 = (x^4)²

Three squarings to reach x^8. Then multiply together the powers at the set bit positions in 13 (positions 3, 2, and 0): x^8 × x^4 × x^1 = x^13. Two more multiplications.

Five multiplications instead of twelve. For n = 10^9, the binary representation has 30 bits, so you need at most 29 squarings plus at most 29 multiplications for set bits. Under 60 total.

The number of squarings is floor(log₂(n)), which is the position of the highest set bit. The number of extra multiplications is at most floor(log₂(n)). Both are O(log n).

Diagram showing n = 13 = 1101₂ with each bit labeled. Below: squaring chain x^1 to x^2 to x^4 to x^8. Dashed arrows from x^8, x^4, and x^1 feed into x^13. The zero bit at position 1 is grayed out, showing x^2 is skipped. n = 13 needs only the set bits. The zero bit at position 1 contributes nothing, but you still square past it.

The Recursive Form Matches the Math

The even/odd split translates directly into a recurrence:

x^n = 1                       if n = 0
x^n = (x^(n/2))²              if n is even
x^n = x × (x^((n-1)/2))²     if n is odd

def power(x: int, n: int) -> int:
    if n == 0:
        return 1
    half = power(x, n // 2)
    if n % 2 == 0:
        return half * half
    return x * half * half

Compute half once and square it. If you wrote power(x, n//2) * power(x, n//2), you'd recurse twice per level and the total calls would double at every step, giving O(n) time again. You just accidentally reimplemented the naive loop in recursive form. Congratulations. Store the result in half.

The recursion depth is floor(log₂(n)) + 1. Each call reduces n by half, so the call stack grows O(log n) deep, about 30 frames for n = 10^9. Stack-safe.

The Iterative Version Is What You Actually Deploy

The iterative form processes bits from the least significant to the most significant. Two variables carry all the state: result accumulates the final answer, and base tracks x^(2^k) for the current bit position k.

def power(x: int, n: int) -> int:
    result = 1
    base = x
    while n > 0:
        if n & 1:
            result *= base  # current bit is set; fold this power into result
        base *= base        # square to get the next power of x
        n >>= 1             # move to the next bit
    return result

The iterative version is faster in practice because it avoids recursive call overhead and cannot overflow the call stack, no matter how large n gets.

Here is the full trace for x^13 (n = 13 = 1101₂):

Trace table showing each iteration of the iterative algorithm for x^13. Columns: n in binary, LSB set, result after conditional multiply, base after squaring, n after right shift. Four iterations, four squarings, three multiplies into result. The zero bit at position 1 fires no multiply but still advances base from x² to x⁴.

Iteration	n (binary)	LSB set?	result	base
1	1101	yes	x	x²
2	110	no	x	x⁴
3	11	yes	x × x⁴ = x⁵	x⁸
4	1	yes	x⁵ × x⁸ = x¹³	x¹⁶
exit	0		x¹³

Four iterations for n = 13, matching floor(log₂ 13) + 1 = 4.

Modular Exponentiation and the Overflow You're Not Thinking About

In competitive programming and cryptography you almost always want (x^n) mod m. Computing x^n first and then taking the modulus is impossible: for x = 2, n = 10^9, and m = 10^9+7, the raw result has over 300 million decimal digits. Your 64-bit integer does not have opinions about 300 million digits. It just overflows and gives you garbage.

The identity that saves you: (a × b) mod m = ((a mod m) × (b mod m)) mod m.

Take the modulus after every multiplication:

def power(x: int, n: int, mod: int) -> int:
    result = 1
    base = x % mod
    while n > 0:
        if n & 1:
            result = result * base % mod
        base = base * base % mod
        n >>= 1
    return result

Taking mod after every step is not optional in languages with fixed-width integers. With mod = 10^9+7 and base up to 10^9, the product before mod reaches ~10^18, which barely fits in a 64-bit integer. With a larger modulus (say, 10^18), the product of two values near mod can overflow even a 64-bit type. The fix in C++ is a cast to __int128 for the multiplication; in Rust, cast to u128.

Two side-by-side bar charts. Left: base value grows explosively without mod, overflowing 64-bit integers by iteration 30. Right: base stays bounded below mod throughout all 60 iterations when mod is applied after each squaring. Left: a disaster. Right: fine. The only difference is two characters: % mod.

Python is immune because its integers are arbitrary precision. Every fixed-width language requires care.

Edge Cases That Will Ruin Your Day

Zero exponent. x^0 = 1 for all x. The loop body never executes when n = 0, so the iterative version returns 1 correctly without any special case. You get this one for free.

Negative exponent. x^(-n) = 1/x^n. For floating-point x, substitute x with 1.0/x and negate n. For integer x under plain multiplication, the result is a fraction and integer binary exponentiation does not apply. Under a prime modulus p, Fermat's little theorem gives a^(-1) ≡ a^(p-2) (mod p), so you compute a modular inverse via binary exponentiation. Math saving the day as usual.

The INT_MIN trap. When n is a 32-bit signed integer equal to -2,147,483,648, the expression -n overflows back to -2,147,483,648. Java's int type hands you this bug as a gift. This exact trap appears in LeetCode 50. Always widen to a 64-bit type before negating:

long N = n;  // widen first, negate second
if (N < 0) { x = 1.0 / x; N = -N; }

Skip this step and your code produces wildly wrong answers for exactly one input. The one input that shows up in tests.

0^0. Mathematically undefined, but the iterative implementation returns 1, which is the conventional answer in combinatorics (the empty product) and what every major language's pow function returns. If an interviewer asks, say "convention: empty product is 1" and move on.

Negative base with modular arithmetic. In C++ and Java, (-7) % 5 is -2, not 3. Start with base = ((x % mod) + mod) % mod to normalize a potentially negative base into the range [0, mod).

O(log n): The Proof Is One Sentence

Operation	Time	Space
Iterative power(x, n)	O(log n)	O(1)
Recursive power(x, n)	O(log n)	O(log n)
Modular power(x, n, mod)	O(log n)	O(1)
Matrix power, k×k matrices	O(k³ log n)	O(k²)

Time is O(log n) because the exponent is halved at every iteration. After k iterations, the remaining exponent is n/2^k. The loop ends when this drops below 1, which requires k > log₂(n). So the loop runs at most floor(log₂ n) + 1 times, and each iteration does exactly two multiplications (one squaring, one conditional multiply into result). Total: O(log n).

This is the same halving argument that makes binary search O(log n). Every time you cut the problem in half, you need one more step, and the steps stack logarithmically.

Iterative space is O(1) because the algorithm uses a fixed set of variables regardless of n: result, base, and exp.

Recursive space is O(log n) because the maximum recursion depth is floor(log₂ n) + 1 and each stack frame is O(1).

Matrix exponentiation time is O(k³ log n). The same O(log n) squarings happen, but each multiplication is now a k×k matrix multiply costing O(k³) with the naive algorithm. For Fibonacci (k = 2), that is a constant 8 scalar multiplications per matrix multiply.

The Algorithm Works on Any Monoid

Binary exponentiation requires only two properties from the operation:

Associativity: (a × b) × c = a × (b × c)
An identity element: there exists e such that a × e = e × a = a

The same bit-scanning loop works unchanged for integers, matrices, polynomials, and permutations. Swap out what "multiplication" and "1" mean, and the algorithm is identical. Category theorists would call this a monoid homomorphism. You can call it a useful trick.

The most powerful generalization is matrix exponentiation. Any sequence defined by a linear recurrence can be encoded as a matrix-vector product: apply the matrix once to advance one step, apply it n times (via binary exponentiation) to advance n steps. That collapses O(n) recurrence evaluation to O(k³ log n), where k is the number of state variables.

One Structure, Every Language

def power(base: int, exp: int, mod: int = 0) -> int:
    if exp < 0:
        raise ValueError("negative exponent; use float base or Fermat's little theorem")
    result = 1
    if mod:
        base %= mod
    while exp > 0:
        if exp & 1:
            result = result * base % mod if mod else result * base
        base = base * base % mod if mod else base * base
        exp >>= 1
    return result

# Python's built-in pow(base, exp, mod) is implemented in C using the same
# algorithm and also handles negative exp and negative mod. Use it in production.

What Problems Does Binary Exponentiation Solve?

Modular inverse via Fermat's little theorem. If p is prime and gcd(a, p) = 1, then a^(p-2) ≡ a^(-1) (mod p). This inverse lets you "divide" inside modular arithmetic. Computing n! / (k! × (n-k)!) mod p requires dividing by the denominator, which means computing its modular inverse. Every such call is a binary exponentiation. The standard competitive programming modulus 10^9+7 is prime specifically because this technique works on primes.

Linear recurrences in O(log n). Any sequence defined by f(n) = c₁f(n-1) + c₂f(n-2) + ... can be encoded as a matrix-vector product, and the matrix raised to the nth power gives f(n) in O(k³ log n) time. For sequences where n reaches 10^18, this is the only viable approach.

Public-key cryptography. RSA encryption is c = m^e mod n. Decryption is m = c^d mod n. Diffie-Hellman key exchange computes g^a mod p and g^b mod p. All three are direct calls to modular binary exponentiation on numbers with hundreds or thousands of bits. OpenSSL's BN_mod_exp uses a windowed variant that groups multiple bits to reduce the multiplication count further.

Primality testing. The inner step of Miller-Rabin is: for a witness a, compute a^((p-1)/2) mod p and check whether the result is 1 or p-1. Each witness test is one binary exponentiation.

How to Recognize a Problem That Calls for This

The clearest signals:

"Return the answer modulo 10^9+7" on a problem that involves choosing or counting (you'll need a modular inverse for division, which needs this)
"Compute x^n mod m" with n or m up to 10^9 or 10^18
"Find the nth term of a recurrence" with n up to 10^18
"Count paths of length exactly n in a graph" with large n (raise the adjacency matrix to the nth power)

Worked Example 1: Pow(x, n) (LeetCode 50)

Problem. Implement pow(x, n) for floating-point x and 32-bit integer n, where n can be negative.

Why binary exponentiation. n can reach 2^31 - 1. A naive loop times out and would need 2 billion iterations for n = 2^31 - 1.

The key trap. If n is a 32-bit integer and equals -2,147,483,648, then -n in a 32-bit operation overflows back to -2,147,483,648. Cast to long before negating.

def myPow(x: float, n: int) -> float:
    if n < 0:
        x = 1.0 / x
        n = -n          # safe in Python; in Java/C++ cast to long first
    result = 1.0
    while n > 0:
        if n & 1:
            result *= x
        x *= x
        n >>= 1
    return result

Time: O(log |n|). Space: O(1).

Worked Example 2: nth Fibonacci mod 10^9+7 with n up to 10^18

Problem. Compute F(n) modulo 10^9+7 where n can be up to 10^18.

Why binary exponentiation. A plain DP loop runs in O(n). For n = 10^18, that would take roughly 10^10 seconds. Even if you somehow parallelized it, you'd still be running when the sun burns out.

The transformation. The Fibonacci recurrence F(n) = F(n-1) + F(n-2) can be written as:

[F(n+1)]   [1 1]^n   [F(1)]
[F(n)  ] = [1 0]   × [F(0)]

Raising the 2x2 transition matrix to the nth power gives F(n) in O(8 log n) scalar multiplications.

Diagram showing the 2x2 Fibonacci matrix M being squared repeatedly: M^1 to M^2 to M^4, then via binary exponentiation to M^n. The top-right entry of M^n equals F(n). The top-right entry of M^n is always F(n). Square the matrix O(log n) times instead of stepping O(n) times.

MOD = 10**9 + 7

def mat_mul(A, B):
    return [
        [(A[0][0]*B[0][0] + A[0][1]*B[1][0]) % MOD,
         (A[0][0]*B[0][1] + A[0][1]*B[1][1]) % MOD],
        [(A[1][0]*B[0][0] + A[1][1]*B[1][0]) % MOD,
         (A[1][0]*B[0][1] + A[1][1]*B[1][1]) % MOD],
    ]

def mat_pow(M, n):
    result = [[1, 0], [0, 1]]  # identity matrix
    while n > 0:
        if n & 1:
            result = mat_mul(result, M)
        M = mat_mul(M, M)
        n >>= 1
    return result

def fib(n: int) -> int:
    if n == 0:
        return 0
    return mat_pow([[1, 1], [1, 0]], n)[0][1]

For n = 10^18, this finishes in about 60 matrix multiplications. The plain DP loop would need 10^18.

What to Take Away

Naive x^n does n multiplications. Binary exponentiation does O(log n) by exploiting the binary representation of the exponent.
Iterative form: check each bit of n from LSB to MSB, square the base on every iteration, multiply into result when the bit is set.
Modular version: take mod after every multiplication. Deferring it causes overflow in fixed-width types.
The INT_MIN bug: cast to 64-bit before negating a negative exponent. -n on a 32-bit -2^31 overflows silently.
Iterative space is O(1). Recursive form uses O(log n) stack space.
The algorithm generalizes to any associative operation with an identity: matrices, polynomials, permutations.
Python's pow(base, exp, mod) is this algorithm in C. Use it.
The algorithm is 2,200 years old and still the fastest known general method for large exponents.

If you want to practice explaining modular overflow, INT_MIN traps, and matrix exponentiation out loud while someone nods or asks follow-up questions, SpaceComplexity runs voice-based mock interviews on exactly these kinds of problems with rubric-based feedback on how clearly you communicated the edge cases, not just whether the code ran.