Exponential Search Algorithm: O(log i), Not O(log n), and Why That Matters

May 27, 202619 min read
dsaalgorithmsinterview-prepdata-structures
Exponential Search Algorithm: O(log i), Not O(log n), and Why That Matters
TL;DR
  • Exponential search algorithm probes positions 1, 2, 4, 8... to bracket the target, then binary searches that range in two clean phases.
  • Time complexity is O(log i) where i is the target's index, not O(log n), so a target at position 7 takes ~3 probes instead of 20.
  • Unbounded arrays are the canonical use case: binary search needs a known upper bound, exponential search does not.
  • Timsort's gallop mode is a direct application, running inside Python's sorted(), Java's Arrays.sort, and Swift's sort().
  • Two common traps: missing the index-0 check before the loop, and forgetting to clamp hi to n-1 when phase 1 exits via the length bound.
  • Worst case degrades to O(log n) when the target is absent and larger than all elements, collapsing to plain binary search.

Binary search starts in the middle. Your target is at index 7 in a million-element array. Binary search's first move: index 500,000. Bold. Confident. Completely indifferent to where you actually put the thing.

The exponential search algorithm scales its complexity to where the target lives, not how long the array is. If the target sits at index i, it finds it in O(log i) time regardless of what comes after. Target at index 7: about three probes. Binary search: about twenty.

The deeper motivation isn't clever front-loaded optimization, though. Jon Bentley and Andrew Chi-Chih Yao built it in 1976 for sorted sequences of unknown or infinite length. Binary search cannot start without a known upper bound. Exponential search doesn't need one.

Also called doubling search or galloping search, it runs inside code you use every day. Timsort's gallop mode, buried inside CPython, Java's Arrays.sort, and Swift's sort(), is literally this algorithm. Every time you called sorted() in Python, this was happening in your name.

Why can't I binary search my life / Your life isn't sorted meme Both binary search and exponential search require sorted input. There is no algorithm for the other problem.

Two Phases, One Search

The algorithm has two phases. Both have the good manners to stop early.

Phase 1 (Range Finding): Start at index 1. Double on each step, checking positions 1, 2, 4, 8, 16, 32, ... until the element at the current index is greater than or equal to the target, or you walk off the end of the array. Call the stopping index bound.

Phase 2 (Binary Search): Binary search the subarray [bound/2, min(bound, n-1)]. The target, if it exists, must live in that range.

arr: [ 2, 5, 8, 12, 16, 23, 38, 41, 56, 72 ]
idx:   0  1  2   3   4   5   6   7   8   9
target = 23

Phase 1 probes:
  bound=1: arr[1]=5  < 23, double
  bound=2: arr[2]=8  < 23, double
  bound=4: arr[4]=16 < 23, double
  bound=8: arr[8]=56 >= 23, stop

Phase 2 binary search in [4, 8]:
  mid=6: arr[6]=38 > 23, search left
  mid=5: arr[5]=23 == 23, found at index 5

Exponential search two-phase walkthrough on a ten-element sorted array, showing phase 1 doubling probes and phase 2 binary search range Phase 1 brackets the target in O(log i) probes. Phase 2 confirms in O(log i) more.

The Invariant That Makes It Correct

When phase 1 stops at bound, two facts hold:

  1. arr[bound/2] < target (the previous check; the loop would have stopped otherwise)
  2. arr[bound] >= target (the reason the loop stopped)

Because the array is sorted, every element before bound/2 is also less than the target. The target cannot be there. The range [bound/2, bound] is a tight bracket: the target, if present, is necessarily inside it.

Correctness invariant diagram showing the three zones: all less than target on the left, the search bracket in amber, all greater than or equal to target on the right The lower bound is safe not because we assume it, but because the previous loop iteration proved it.

What Happens Without a Known Length

For an infinite or unknown-length array, the bound >= n check disappears. Phase 1 simply doubles until arr[bound] >= target. Phase 2 is unchanged. No other comparison-based algorithm can search an unbounded sorted sequence in O(log i) time while reading only logarithmically many elements. That is the problem this algorithm was built for.

Unbounded sorted sequence diagram showing exponential probes doubling into an infinite array, with the binary search bracket clamping down on the target's position No upper bound needed. The algorithm finds its own bracket.

  position: 1  2  4  8  16  32  ?  ...  (unbounded)
  target at position 23

  probes: 1, 2, 4, 8, 16, 32
  arr[16] < target, arr[32] >= target
  binary search [16, 32] → find target

Exponential Search Time Complexity

The algorithm exposes one operation: search.

OperationTimeSpace
Search (iterative)O(log i)O(1)
Search (recursive)O(log i)O(log i)
Search, best caseO(1)O(1)
Search, worst case (target absent)O(log n)O(1)

Why O(log i): If the target is at index i, the loop doubles bound starting from 1 until bound > i. The number of doublings is ⌈log₂(i + 1)⌉, which is O(log i). The binary search range is [bound/2, bound], spanning at most bound/2 elements. Because bound is the smallest power of 2 exceeding i, we have bound ≤ 2i, so bound/2 ≤ i. Binary search on a range of at most i elements takes O(log i) comparisons. Total: O(log i) + O(log i) = O(log i).

Why O(log n) worst case: When the target is absent and greater than all elements, phase 1 runs until bound >= n, taking O(log n) doublings. Performance collapses to plain binary search, which is the right behavior when position offers no advantage.

Why O(1) best case: If arr[0] == target, the algorithm returns before any doubling. Genuinely, pleasantly O(1).

Why O(1) space (iterative): A fixed number of scalar variables regardless of input size. The recursive variant adds O(log i) call frames.

One Structure, Every Language

Each implementation covers the bounded case. For the unbounded case, remove the bound < n guard in phase 1 and the min(bound, n-1) clamp in phase 2.

def exponential_search(arr: list[int], target: int) -> int: n = len(arr) if n == 0: return -1 if arr[0] == target: return 0 bound = 1 while bound < n and arr[bound] < target: bound *= 2 lo, hi = bound // 2, min(bound, n - 1) while lo <= hi: mid = lo + (hi - lo) // 2 if arr[mid] == target: return mid if arr[mid] < target: lo = mid + 1 else: hi = mid - 1 return -1

What Problems It Solves

Searching unbounded or infinite sorted sequences. This is the canonical use case. When you have a sorted data source with no known length, a database cursor, a sorted file stream, a lazy generator, binary search is impossible. Exponential search's phase 1 only reads forward, doubling until it finds a bracket. Then phase 2 closes in.

Front-heavy sorted arrays. When you have good reason to believe the target lives near the start of a large sorted array, exponential search finds it in O(log i) instead of O(log n). Array with ten million entries, targets almost always in the first hundred: about seven probes total instead of twenty-three.

Merging runs of wildly unequal length. This is Timsort's gallop mode, and it is genuinely clever. When merging a run of length 8 into a run of length 50,000, you need to find where each element of the short run belongs in the long run. Exponential search locates the insertion point in O(log 8) = O(3) probes instead of O(log 50,000) ≈ 16. Timsort kicks into gallop mode as soon as one input consistently wins seven or more consecutive comparisons.

Timsort gallop mode diagram showing a short 8-element run being merged into a long 50,000-element run, with exponential probes finding the insertion point in log(8) comparisons instead of log(50000) The short run has 8 elements. Each one costs 3 probes to place, not 16. That savings compounds.

Lower bound and upper bound queries. Adapt phase 2 to find the first position where arr[i] >= target or the last position where arr[i] <= target by replacing the equality check with the appropriate boundary binary search.

Five Signals This Is the Right Tool

The word "unbounded" is a giveaway, but front-heavy distributions and unequal-length merge problems look like ordinary search problems until you think about position. Here are the five signals:

  1. The array is sorted but you cannot read its length.
  2. The problem statement says "infinite array," "stream," or "unbounded."
  3. You are explicitly told the target is likely near the beginning of a large sorted structure.
  4. You need to merge or interleave a short sorted sequence into a much longer one.
  5. The brute-force solution is linear scan, and the data is sorted, but there is no usable upper bound for binary search.

Worked Example 1: Search in an Infinite Sorted Array

Problem. You are given an ArrayReader interface with a single method reader.get(i) that returns the element at index i, or Integer.MAX_VALUE if i is out of bounds. The underlying array is sorted in ascending order. Find the index of target, or return -1 if it is not present.

Why exponential search fits: You cannot call .length(). Binary search needs hi before it can do anything. Exponential search's phase 1 only ever reads forward, doubling until it finds a position where reader.get(bound) >= target. Once it has that bracket, it hands off to binary search on a concrete range.

def search(reader, target: int) -> int: if reader.get(0) == target: return 0 bound = 1 while reader.get(bound) < target: bound *= 2 lo, hi = bound // 2, bound while lo <= hi: mid = lo + (hi - lo) // 2 val = reader.get(mid) if val == target: return mid if val < target: lo = mid + 1 else: hi = mid - 1 return -1

Complexity: O(log i) time, O(1) space, where i is the target's index. No length needed at any point.

Worked Example 2: Find Insert Position in a Large Sorted Log

Problem. You have a sorted array of 50 million log timestamps. New events come in and you need to find where each would be inserted to maintain sorted order. A profiler shows 95% of new events fall within the first 10,000 timestamps.

Why exponential search fits: Binary search costs O(log 50,000,000) ≈ 26 comparisons per event. Exponential search does O(log 10,000) ≈ 14 probes for 95% of them, and falls back to O(log n) for the other 5%. Average probes per query drops from 26 to roughly 15. The adaptation is a lower_bound binary search in phase 2 instead of an equality check.

import bisect def find_insert_position(timestamps: list[int], new_ts: int) -> int: n = len(timestamps) if n == 0 or timestamps[0] >= new_ts: return 0 bound = 1 while bound < n and timestamps[bound] < new_ts: bound *= 2 lo, hi = bound // 2, min(bound, n - 1) # lower_bound: find first index where timestamps[idx] >= new_ts result = hi + 1 while lo <= hi: mid = lo + (hi - lo) // 2 if timestamps[mid] >= new_ts: result = mid hi = mid - 1 else: lo = mid + 1 return result

Why lo = bound // 2 is a safe lower bound: At that index, timestamps[bound//2] < new_ts is guaranteed by the last successful loop iteration. No element before bound//2 can be the insertion point.

The Two Bugs Everyone Writes

Bug one: the missing index-zero check. Phase 1 starts bound at 1, not 0. If the target is at index 0, the loop never probes it. The binary search range starts at bound//2 = 0 and would eventually find the element, but it wastes comparisons getting there. Adding if arr[0] == target: return 0 before the loop makes the best case truly O(1). This is also what proves that index 0 is always checked before phase 1 runs, which feeds back into the correctness argument.

Bug two: forgetting to clamp hi. Writing hi = bound instead of hi = min(bound, n-1) causes an out-of-bounds access in the binary search phase. When the loop exits because bound >= n, the element at index bound was never actually checked. It doesn't exist. Using bound directly as hi is a segfault in C and an IndexError in Python. The clamp is mandatory.

Both bugs are easy to write under pressure and easy to miss in a quick read. The first causes a missed optimization; the second causes a crash. Know which is which before the interview.

The Short Version

  • Exponential search probes positions 1, 2, 4, 8, ... until it brackets the target, then binary searches the bracket.
  • Time complexity is O(log i) where i is the target's index, not O(log n).
  • For a target at index i, phase 1 takes ⌈log₂(i+1)⌉ probes; phase 2 searches at most i elements in O(log i).
  • Space is O(1) iterative, O(log i) recursive.
  • The algorithm works on unbounded sorted sequences where binary search cannot start.
  • Timsort's gallop mode is a direct application. It has been running every time you called sorted() in Python.
  • Two traps: missing the index-0 check, and forgetting to clamp hi to n-1.

If you want to practice explaining two-phase algorithms like this under real interview pressure, SpaceComplexity runs voice-based DSA mock interviews with rubric-based feedback. Walking through the invariant proof out loud while someone is probing your reasoning is a different skill from getting it right on paper.

For more on the binary search family, the binary search invariant deep dive covers the loop invariant proof in detail. The binary search off-by-one post covers the lo, hi, mid boundary bugs that come up in exactly the kind of phase-2 code shown here.

Further Reading