In-Place Algorithms: Why Your O(1) Space Claim Is Probably Wrong

May 26, 202611 min read
dsaalgorithmsinterview-prepdata-structures
In-Place Algorithms: Why Your O(1) Space Claim Is Probably Wrong
TL;DR
  • Auxiliary space excludes the input itself; call stack frames count toward it, not just explicitly allocated buffers.
  • Recursive binary search uses O(log n) auxiliary space; the iterative version is the genuinely O(1) variant.
  • Heapsort is the strictest in-place sort: iterative sift-down keeps stack depth constant throughout.
  • Quicksort with naive recursion can hit O(n) stack depth on sorted input; Sedgewick's tail-call trick guarantees O(log n).
  • Merge sort requires O(n) auxiliary space and cannot merge in O(1) without sacrificing time or stability.
  • The strict in-place definition requires O(1) auxiliary; the loose standard allows O(log n) for logarithmic recursion depth.
  • In interviews, explicitly state whether your space analysis counts the recursion stack and what it comes to.

You see a LeetCode problem that says "in-place, O(1) extra space." You write a recursive solution. No extra arrays anywhere. Must be O(1), right?

Not quite. The recursion stack is memory, and it counts.

This is the single most common mistake engineers make when analyzing space complexity. An in-place algorithm is not defined as "no data structures I wrote myself." It means the algorithm's auxiliary memory usage is bounded by a constant, regardless of input size. Recursion depth translates directly to stack frames. Stack frames take real RAM.

r/ProgrammerHumor: "dontDoRecursiveFibKids", a meme about the hidden cost of naive recursion

Recursion looks innocent. It is not.

Auxiliary Space Is Not Total Space

Auxiliary space is the extra memory an algorithm uses beyond the input itself. Total space complexity counts everything: the input array plus whatever the algorithm allocates on top. Auxiliary space excludes the mandatory input storage and asks only about the overhead.

An algorithm that takes an array of n integers and computes a sum has space complexity O(n) but auxiliary space O(1). One accumulator variable, nothing else. When people say an algorithm is "in-place," they mean its auxiliary space is O(1), sometimes loosened to O(log n).

This distinction matters because all sorting algorithms receive their input array for free. The question is what they need beyond that.

The Stack Is Not Free

Call a recursive function. Your runtime pushes a frame onto the call stack containing a return address (8 bytes), a saved frame pointer (8 bytes), local variables, and any spilled registers. A typical frame is 64 to 128 bytes. Stack Overflow the website is named after this exact failure mode. You push enough frames and your OS terminates the process without so much as a goodbye.

Recurse n levels deep and you have n frames on the stack simultaneously. That is O(n) auxiliary space, regardless of whether you called malloc even once.

def binary_search_recursive(arr, lo, hi, target): if lo > hi: return -1 mid = (lo + hi) // 2 if arr[mid] == target: return mid elif arr[mid] < target: return binary_search_recursive(arr, mid + 1, hi, target) else: return binary_search_recursive(arr, lo, mid - 1, target)

This recurses at most log₂(n) levels deep. At peak, the call stack holds O(log n) frames. The recursive version of binary search uses O(log n) auxiliary space, not O(1).

Now the iterative version:

def binary_search_iterative(arr, target): lo, hi = 0, len(arr) - 1 while lo <= hi: mid = (lo + hi) // 2 if arr[mid] == target: return mid elif arr[mid] < target: lo = mid + 1 else: hi = mid - 1 return -1

Three variables. Fixed stack depth. O(1) auxiliary space. This is the genuinely in-place version.

The algorithm is identical. The choice of iteration vs recursion is what changes the space class.

Left: recursive binary search on a 16-element array showing 4 call frames stacking up. Right: iterative version with a single frame holding lo, hi, and mid.

Same array, same search logic, completely different memory profile.

Which Sorts Are Actually In-Place?

AlgorithmAuxiliary SpaceIn-Place?
HeapsortO(1)Yes (strict)
Insertion sortO(1)Yes (strict)
Bubble sortO(1)Yes (strict)
Quicksort with TCOO(log n) guaranteedYes (loose)
Quicksort naiveO(log n) avg, O(n) worstDebatable
Merge sortO(n)No
Timsort (Python, Java)O(n)No

Timsort deserves a callout. Python's sorted() and Java's Arrays.sort for objects both use Timsort. It is stable, fast in practice, and not in-place. The merge step requires a temporary buffer. When you tell someone "Python's sort is in-place," you mean it sorts the list without returning a new one. The memory picture is different. These are not the same sentence.

Heapsort Is the Gold Standard

Heapsort uses O(1) auxiliary space, and it is worth understanding exactly why. The entire algorithm runs inside the input array.

Phase 1 (build-heap) uses Floyd's bottom-up approach: start from the last non-leaf node, call sift-down toward the root. No temporary array. The heap property is established by rearranging elements already in the array.

Phase 2 (extraction) repeatedly swaps the root (the maximum) with the last element, shrinks the heap boundary by one, and calls sift-down on the new root. Everything stays in the original array.

Critically, sift-down is implemented iteratively. No recursion. Just a while loop following one path from parent to child.

def sift_down(arr, n, i): while True: largest = i left, right = 2 * i + 1, 2 * i + 2 if left < n and arr[left] > arr[largest]: largest = left if right < n and arr[right] > arr[largest]: largest = right if largest == i: break arr[i], arr[largest] = arr[largest], arr[i] i = largest

The while loop updates i in place instead of recursing. Stack depth stays constant throughout the entire sort. O(1) auxiliary space is guaranteed, not just hoped for.

This is why heapsort is used as a fallback in introsort (GCC's std::sort, LLVM's libc++): when quicksort's recursion goes too deep, the implementation switches to heapsort precisely because heapsort cannot blow the stack.

Heapsort operating entirely within the input array. Phase 1 shows sift-down arrows staying inside array bounds. Phase 2 shows the sorted region growing on the right while the heap shrinks. The call stack bar at the bottom stays flat throughout.

No temporary buffers. No recursion. Just the array, rearranging itself.

Quicksort's In-Place Claim Is Complicated

Quicksort is described as in-place constantly. In textbooks. In interview prep guides. By every candidate who has ever answered a complexity question on a whiteboard. The reality is messier.

Standard recursive quicksort makes two recursive calls after partitioning. On a sorted input with a naive first-element pivot, every partition produces subarrays of size 0 and n-1. The recursion goes n levels deep. Naive quicksort can use O(n) auxiliary space on the call stack, with no extra arrays allocated anywhere.

Random pivot selection gives O(log n) expected depth, but expected is not worst-case. Sedgewick's fix guarantees O(log n) worst-case stack depth regardless of pivot quality: always recurse into the smaller partition, then handle the larger one by updating loop bounds.

def quicksort(arr, lo, hi): while lo < hi: p = partition(arr, lo, hi) if (p - lo) < (hi - p): quicksort(arr, lo, p - 1) # recurse into smaller (left) lo = p + 1 # iterate over larger (right) else: quicksort(arr, p + 1, hi) # recurse into smaller (right) hi = p - 1 # iterate over larger (left)

The recursive call always enters the smaller half, which has at most floor((n-1)/2) elements. That means the recursion can only halve at each level, bounding depth at log₂(n). The while loop handles the larger partition without pushing a new frame.

Left: naive quicksort on a sorted array degenerates to a chain of depth n. Right: Sedgewick's optimization on the same array, bounded at depth log n because large partitions become loop iterations rather than recursive calls.

One line of code (recurse smaller, loop larger) is the difference between O(n) and O(log n) stack usage.

For a deep comparison of quicksort and merge sort tradeoffs, see Merge Sort vs Quick Sort: The Real Analysis, Myths Included.

Merge Sort Cannot Merge in O(1)

The merge step takes two sorted subarrays and produces one sorted output. To do this without corrupting input as you read it, you need a buffer. Think of it like merging two sorted piles of mail. You cannot push them together into one sorted pile without a surface to spread them on. Your kitchen counter is O(n) auxiliary space. The standard implementation allocates O(n) total auxiliary space across a given level of recursion.

True O(1) in-place merge is theoretically possible but comes at a severe cost. Known in-place merge algorithms either sacrifice the O(n log n) time bound (degrading to O(n log² n) comparisons) or sacrifice stability. None are practical for general use. When you see merge sort listed as O(n) space, that is not a lazy implementation choice. It reflects the reality of merging two sorted sequences without a scratch buffer.

This is why comparison-based in-place stable sorting is a famously hard problem. Heapsort gets you O(1) space but gives up stability. Merge sort gets you stability but costs O(n) space. You pick your constraint.

The Strict Definition and the Loose One

The term "in-place" has two definitions in active use.

Strict (O(1) auxiliary): Only a constant number of variables beyond the input. Iterative algorithms or tail-recursive ones with compiler optimization. Heapsort, iterative binary search, bubble sort, insertion sort, two-pointer scans, Dutch National Flag.

Loose (O(log n) auxiliary): The logarithmic recursion stack is treated as negligible because log n grows extremely slowly. Quicksort with TCO, recursive binary search, recursive divide-and-conquer with logarithmic depth. This is the practical standard in most interviews and textbooks.

Any algorithm using O(n) auxiliary space is not in-place under either definition. Merge sort, Timsort, and recursive DFS on a path-shaped graph all live outside the boundary.

Three Questions to Ask Before You Answer

When a recursive solution is on the board and an interviewer asks about space complexity, do not just scan for data structures you explicitly allocated. Ask three questions:

  1. How deep does the recursion go?
  2. Does that depth depend on input size?
  3. What is each frame storing?

For a recursive approach with O(log n) depth, say: "This uses O(log n) auxiliary space for the call stack. If we need strict O(1) space, we convert to iteration." That answer is better than "O(1), I didn't allocate anything."

For iterative DFS, you are using an explicit stack data structure. It holds O(V) nodes in the worst case on a path graph. You can see it. It mocks you. But you have not escaped the cost. An explicit stack and a recursive call stack are the same memory at different addresses.

The question that cuts through everything: what is the maximum amount of data simultaneously live during execution, not counting the input? That is your auxiliary space. Walk the algorithm forward, track what accumulates, and note the peak.

Iterative DFS (left) with an explicit Python list growing to 6 elements on a path graph. Recursive DFS (right) with the call stack growing to 6 frames on the same graph. Both reach the same depth. Both use the same O(n) space.

One stack is yours. The other is the OS's. Neither is free.

To understand exactly what a call frame contains and why its size matters, The Call Stack: What Really Happens When You Call a Function breaks it down instruction by instruction. For the full analysis of recursion depth across tree shapes, Recursion Space Complexity: Your Stack Only Holds One Path at a Time covers the cases systematically.

Recap

  • Auxiliary space is extra memory beyond the input. Call stack frames count toward it.
  • Recursive binary search is O(log n) auxiliary. The iterative version is O(1).
  • Heapsort is the gold standard: iterative sift-down, O(1) auxiliary guaranteed.
  • Quicksort is O(log n) average auxiliary, O(n) worst case without Sedgewick's optimization. With tail-call optimization, O(log n) worst case.
  • Merge sort is O(n) auxiliary. That cost is not removable without giving up time or stability.
  • The strict definition of in-place requires O(1). The practical standard allows O(log n).
  • In interviews: state explicitly whether your space analysis counts the recursion stack, and what it comes to when it does.

Explaining exactly where each byte of memory comes from is one of the signals that separates a surface-level answer from a strong one. If you want to build that habit under realistic interview conditions, SpaceComplexity runs voice-based mock interviews with rubric-based feedback on exactly this kind of depth.

Further Reading