In-Place Algorithms: Why Your O(1) Space Claim Is Probably Wrong

You see a LeetCode problem that says "in-place, O(1) extra space." You write a recursive solution. No extra arrays anywhere. Must be O(1), right?

Not quite. The recursion stack is memory, and it counts.

This is the single most common mistake engineers make when analyzing space complexity. An in-place algorithm is not defined as "no data structures I wrote myself." It means the algorithm's auxiliary memory usage is bounded by a constant, regardless of input size. Recursion depth translates directly to stack frames. Stack frames take real RAM.

r/ProgrammerHumor: "dontDoRecursiveFibKids", a meme about the hidden cost of naive recursion

Recursion looks innocent. It is not.

Auxiliary Space Is Not Total Space

Auxiliary space is the extra memory an algorithm uses beyond the input itself. Total space complexity counts everything: the input array plus whatever the algorithm allocates on top. Auxiliary space excludes the mandatory input storage and asks only about the overhead.

An algorithm that takes an array of n integers and computes a sum has space complexity O(n) but auxiliary space O(1). One accumulator variable, nothing else. When people say an algorithm is "in-place," they mean its auxiliary space is O(1), sometimes loosened to O(log n).

This distinction matters because all sorting algorithms receive their input array for free. The question is what they need beyond that.

The Stack Is Not Free

Call a recursive function. Your runtime pushes a frame onto the call stack containing a return address (8 bytes on x86-64), a saved frame pointer (8 bytes), local variables, and any spilled registers, all kept 16-byte aligned by the System V AMD64 ABI. That puts a typical frame in the 64 to 128 byte range once you count alignment padding and a handful of locals. Stack Overflow the website is named after this exact failure mode. You push enough frames and your OS terminates the process without so much as a goodbye.

Recurse n levels deep and you have n frames on the stack simultaneously. That is O(n) auxiliary space, regardless of whether you called malloc even once.

def binary_search_recursive(arr, lo, hi, target):
    if lo > hi:
        return -1
    mid = (lo + hi) // 2
    if arr[mid] == target:
        return mid
    elif arr[mid] < target:
        return binary_search_recursive(arr, mid + 1, hi, target)
    else:
        return binary_search_recursive(arr, lo, mid - 1, target)

This recurses at most log₂(n) levels deep. At peak, the call stack holds O(log n) frames. The recursive version of binary search uses O(log n) auxiliary space, not O(1).

Now the iterative version:

def binary_search_iterative(arr, target):
    lo, hi = 0, len(arr) - 1
    while lo <= hi:
        mid = (lo + hi) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            lo = mid + 1
        else:
            hi = mid - 1
    return -1

Three variables. Fixed stack depth. O(1) auxiliary space. This is the genuinely in-place version.

The algorithm is identical. The choice of iteration vs recursion is what changes the space class.

Same array, same search logic, completely different memory profile.

Which Sorts Are Actually In-Place?

Algorithm	Auxiliary Space	In-Place?
Heapsort	O(1)	Yes (strict)
Insertion sort	O(1)	Yes (strict)
Bubble sort	O(1)	Yes (strict)
Quicksort with TCO	O(log n) guaranteed	Yes (loose)
Quicksort naive	O(log n) avg, O(n) worst	Debatable
Merge sort	O(n)	No
Timsort (Python, Java)	O(n)	No

Timsort deserves a callout. Python's sorted() and Java's Arrays.sort for objects both use Timsort. It is stable, fast in practice, and not in-place. The merge step requires a temporary buffer. When you tell someone "Python's sort is in-place," you mean it sorts the list without returning a new one. The memory picture is different. These are not the same sentence.

Heapsort Is the Gold Standard

Heapsort uses O(1) auxiliary space, and it is worth understanding exactly why. The entire algorithm runs inside the input array.

Phase 1 (build-heap) uses Floyd's bottom-up approach: start from the last non-leaf node, call sift-down toward the root. No temporary array. The heap property is established by rearranging elements already in the array.

Phase 2 (extraction) repeatedly swaps the root (the maximum) with the last element, shrinks the heap boundary by one, and calls sift-down on the new root. Everything stays in the original array.

Critically, sift-down is implemented iteratively. No recursion. Just a while loop following one path from parent to child.

def sift_down(arr, n, i):
    while True:
        largest = i
        left, right = 2 * i + 1, 2 * i + 2
        if left < n and arr[left] > arr[largest]:
            largest = left
        if right < n and arr[right] > arr[largest]:
            largest = right
        if largest == i:
            break
        arr[i], arr[largest] = arr[largest], arr[i]
        i = largest

The while loop updates i in place instead of recursing. Stack depth stays constant throughout the entire sort. O(1) auxiliary space is guaranteed, not just hoped for.

This is why heapsort is used as a fallback in introsort (Musser 1997), the algorithm behind GCC's std::sort and LLVM's libc++. When quicksort's recursion exceeds about 2·log₂(n) levels, the implementation switches to heapsort precisely because heapsort cannot blow the stack. You can see this directly in libstdc++ __introsort_loop, which calls a heap-based __partial_sort the moment its depth_limit hits zero.

Heapsort operating entirely within the input array. Phase 1 shows sift-down arrows staying inside array bounds. Phase 2 shows the sorted region growing on the right while the heap shrinks. The call stack bar at the bottom stays flat throughout.

No temporary buffers. No recursion. Just the array, rearranging itself.

Quicksort's In-Place Claim Is Complicated

Quicksort is described as in-place constantly. In textbooks. In interview prep guides. By every candidate who has ever answered a complexity question on a whiteboard. The reality is messier.

Standard recursive quicksort makes two recursive calls after partitioning. On a sorted input with a naive first-element pivot, every partition produces subarrays of size 0 and n-1. The recursion goes n levels deep. Naive quicksort can use O(n) auxiliary space on the call stack, with no extra arrays allocated anywhere.

Random pivot selection gives O(log n) expected depth, but expected is not worst-case. Sedgewick's 1978 fix guarantees O(log n) worst-case stack depth regardless of pivot quality: always recurse into the smaller partition, then handle the larger one by updating loop bounds.

def quicksort(arr, lo, hi):
    while lo < hi:
        p = partition(arr, lo, hi)
        if (p - lo) < (hi - p):
            quicksort(arr, lo, p - 1)   # recurse into smaller (left)
            lo = p + 1                   # iterate over larger (right)
        else:
            quicksort(arr, p + 1, hi)   # recurse into smaller (right)
            hi = p - 1                   # iterate over larger (left)

The recursive call always enters the smaller half, which has at most floor((n-1)/2) elements. That means the recursion can only halve at each level, bounding depth at log₂(n). The while loop handles the larger partition without pushing a new frame.

Left: naive quicksort on a sorted array degenerates to a chain of depth n. Right: Sedgewick's optimization on the same array, bounded at depth log n because large partitions become loop iterations rather than recursive calls.

One line of code (recurse smaller, loop larger) is the difference between O(n) and O(log n) stack usage.

For a deep comparison of quicksort and merge sort tradeoffs, see Merge Sort vs Quick Sort: The Real Analysis, Myths Included.

Merge Sort Cannot Merge in O(1)

The merge step takes two sorted subarrays and produces one sorted output. To do this without corrupting input as you read it, you need a buffer. Think of it like merging two sorted piles of mail. You cannot push them together into one sorted pile without a surface to spread them on. Your kitchen counter is O(n) auxiliary space. The standard implementation allocates O(n) total auxiliary space across a given level of recursion.

True O(1) in-place merge is theoretically possible but comes at a severe cost. Huang and Langston's 1988 CACM paper gave a linear-time in-place merge for linked lists, but the array version pays for it: swapping √n-sized blocks is itself O(√n) per swap, and stable variants either give up the O(n log n) time bound or accept large constants. None are practical for general use. When you see merge sort listed as O(n) space, that is not a lazy implementation choice. It reflects the reality of merging two sorted sequences without a scratch buffer.

This is why comparison-based in-place stable sorting is a famously hard problem. Heapsort gets you O(1) space but gives up stability. Merge sort gets you stability but costs O(n) space. You pick your constraint.

The Strict Definition and the Loose One

The term "in-place" has two definitions in active use.

Strict (O(1) auxiliary): Only a constant number of variables beyond the input. Iterative algorithms or tail-recursive ones with compiler optimization. Heapsort, iterative binary search, bubble sort, insertion sort, two-pointer scans, Dutch National Flag.

Loose (O(log n) auxiliary): The logarithmic recursion stack is treated as negligible because log n grows extremely slowly. Quicksort with TCO, recursive binary search, recursive divide-and-conquer with logarithmic depth. This is the practical standard in most interviews and textbooks.

Any algorithm using O(n) auxiliary space is not in-place under either definition. Merge sort, Timsort, and recursive DFS on a path-shaped graph all live outside the boundary.

Three Questions to Ask Before You Answer

When a recursive solution is on the board and an interviewer asks about space complexity, do not just scan for data structures you explicitly allocated. Ask three questions:

How deep does the recursion go?
Does that depth depend on input size?
What is each frame storing?

For a recursive approach with O(log n) depth, say: "This uses O(log n) auxiliary space for the call stack. If we need strict O(1) space, we convert to iteration." That answer is better than "O(1), I didn't allocate anything."

For iterative DFS, you are using an explicit stack data structure. It holds O(V) nodes in the worst case on a path graph. You can see it. It mocks you. But you have not escaped the cost. An explicit stack and a recursive call stack are the same memory at different addresses.

The question that cuts through everything: what is the maximum amount of data simultaneously live during execution, not counting the input? That is your auxiliary space. Walk the algorithm forward, track what accumulates, and note the peak.

Iterative DFS (left) with an explicit Python list growing to 6 elements on a path graph. Recursive DFS (right) with the call stack growing to 6 frames on the same graph. Both reach the same depth. Both use the same O(n) space.

One stack is yours. The other is the OS's. Neither is free.

To understand exactly what a call frame contains and why its size matters, The Call Stack: What Really Happens When You Call a Function breaks it down instruction by instruction. For the full analysis of recursion depth across tree shapes, Recursion Space Complexity: Your Stack Only Holds One Path at a Time covers the cases systematically.

Recap

Auxiliary space is extra memory beyond the input. Call stack frames count toward it.
Recursive binary search is O(log n) auxiliary. The iterative version is O(1).
Heapsort is the gold standard: iterative sift-down, O(1) auxiliary guaranteed.
Quicksort is O(log n) average auxiliary, O(n) worst case without Sedgewick's optimization. With tail-call optimization, O(log n) worst case.
Merge sort is O(n) auxiliary. That cost is not removable without giving up time or stability.
The strict definition of in-place requires O(1). The practical standard allows O(log n).
In interviews: state explicitly whether your space analysis counts the recursion stack, and what it comes to when it does.

Explaining exactly where each byte of memory comes from is one of the signals that separates a surface-level answer from a strong one. If you want to build that habit under realistic interview conditions, SpaceComplexity runs voice-based mock interviews with rubric-based feedback on exactly this kind of depth.