The Call Stack: What Really Happens When You Call a Function

- Call stack is a contiguous memory region that grows with each function call and shrinks on return, capped at 8 MB on Linux/macOS
- Every call pushes a stack frame holding the return address, saved frame pointer, local variables, and callee-saved registers
- Recursion depth equals frame count equals space cost: O(n) for linear chains, O(log n) for halving, O(h) for tree DFS
- Iterative DFS with an explicit stack is the same algorithm as recursive DFS, just backed by heap memory instead of the call stack
- Python raises RecursionError at 1,000 frames as a software guard before the OS crashes the process with SIGSEGV
- Tail call optimization (GCC/Clang at -O2) reuses the current frame for the final recursive call; Python deliberately omits it to keep tracebacks readable
You type factorial(5) and get 120 back. Between those two moments, the machine tracked five simultaneous contexts, kept their local variables from colliding, and knew exactly which line to resume at after each one returned. No global state. No coordination code. Magic, apparently.
It is not magic. It is the call stack. And once you see the mechanism clearly, recursion stops feeling mysterious, stack overflow stops feeling random, and iterative DFS stops feeling like a clever trick. It starts feeling obvious.
Where Your Function's Memory Actually Lives
A process's virtual memory is carved into regions. Text holds the compiled machine code. The heap holds dynamically allocated memory: your malloc, your new, every Python object. The stack holds the active call chain.
The stack is a contiguous slab of memory that grows toward lower addresses every time you call a function and shrinks back when the function returns.

Virtual memory layout. The stack and heap grow toward each other. They never actually meet unless something has gone horribly wrong.
On Linux and macOS, the main thread's stack is 8 MB. macOS gives secondary threads 512 KB. Windows gives each thread 1 MB by default. These are not recommendations. They are walls.
What a Stack Frame Contains
Every function call gets its own chunk of that stack region: a stack frame. The frame holds everything the function needs to run and return cleanly.
Four things live in a typical frame: the return address, the saved frame pointer, local variables, and any callee-saved registers this function touched.

One stack frame. The return address is the machine's version of a sticky note that says "come back here when you're done."
Let's define each:
- Return address. When the function finishes, the CPU needs somewhere to jump. The
callinstruction pushes this automatically before jumping. It is the address of the instruction immediately after the call site. - Saved frame pointer. The frame pointer register (
rbpon x86-64) is a stable anchor into the current frame. Every function saves the caller'srbpfirst, so it can restore it before returning. - Local variables. Any variable that doesn't fit in a CPU register gets carved out here.
- Callee-saved registers. If the function uses
rbx,r12throughr15, the ABI requires it to save and restore them. Those saved values live in the frame.
One thing that does not usually live on the stack: arguments. On x86-64 Linux and macOS, the first six integer or pointer arguments pass in registers: rdi, rsi, rdx, rcx, r8, r9. Only a seventh argument or beyond spills to the caller's frame. This is worth knowing because it means a function call with six or fewer arguments adds almost no stack overhead beyond the frame itself.
The Call: Three Instructions Do All the Setup
Take this Python function, compiled to x86-64:
def add(a, b): result = a + b return result
The caller puts a in rdi and b in rsi. Then:
call add ; pushes return address onto stack, jumps to add
That single call instruction does two things in one step: it pushes the address of the next instruction (the one after call) onto the stack, then jumps to add. The CPU has now committed to coming back.
Inside add, the prologue runs three instructions:
add: pushq %rbp ; save caller's frame pointer movq %rsp, %rbp ; set rbp = current top of stack (our frame base) subq $16, %rsp ; carve out 16 bytes for local variables

Three instructions. That's the entire frame setup. The epilogue is leave; ret, which is two more. Five instructions total to enter and exit any function.
Subtracting from rsp allocates space because the stack grows downward. The compiler decides how much to subtract based on how many local variables the function has. It rounds up to maintain 16-byte alignment (an SSE requirement on x86-64).
After the prologue, rbp anchors the frame. Locals live at rbp - 8, rbp - 16, and so on. The saved frame pointer lives at rbp. The return address lives at rbp + 8.
Walking Through factorial(3)
def factorial(n): if n <= 1: return 1 return n * factorial(n - 1)
Here is what happens when factorial(3) is called.
Frame 1 is pushed. n = 3, return address points back to the caller. We check 3 <= 1, which is false, so we recurse.
Frame 2 is pushed. n = 2, return address points to the multiply instruction inside Frame 1 (the n * part, which can't run until the recursive call returns). We check 2 <= 1, false, recurse again.
Frame 3 is pushed. n = 1, return address points to Frame 2's multiply. We check 1 <= 1, true. We return 1.

The stack at maximum depth. Three frames, three separate n variables, zero shared state. That's the whole trick.
Now the unwinding. Frame 3's epilogue runs:
leave ; restores rsp = rbp, then pops rbp (restores caller's frame pointer) ret ; pops return address, jumps to it
ret pops the return address and jumps. We land inside Frame 2, right at the multiply. Frame 2 computes 2 * 1 = 2, runs its own epilogue, jumps back into Frame 1. Frame 1 computes 3 * 2 = 6, returns. Done.
The call stack is how the machine keeps "where was I?" at every nesting level, without any global bookkeeping.
Recursion Is Just a Call Stack. Literally.
This resolves most of the confusion around recursive code.
There is nothing special about a function calling itself. The CPU pushes a frame, runs the function body, pops when done. The code inside the frame is identical to its caller's code, but the frames are separate allocations with separate variables. n in Frame 1 and n in Frame 2 are different memory addresses.
The depth of recursion is the height of the stack at any given moment, which is exactly your space cost.
A linear chain like factorial(n) pushes n frames before hitting the base case: O(n) space. Binary search halves the problem each call, so the chain is at most log n frames deep: O(log n) space. A tree DFS pushes at most one frame per level of the tree: O(h) space, where h is the tree height. You can read the space complexity directly off the call chain's shape. The recursion time complexity guide works through this in more detail using recurrence trees.
The Explicit Stack Is the Same Algorithm
Iterative DFS with an explicit stack is not an optimization. It is the recursive algorithm, written out manually.
def dfs_recursive(node): if node is None: return visit(node) dfs_recursive(node.left) dfs_recursive(node.right) def dfs_iterative(root): stack = [root] while stack: node = stack.pop() if node is None: continue visit(node) stack.append(node.right) # right pushed first stack.append(node.left) # left comes off first (LIFO)
Every item you push onto the iterative stack represents state that would have been saved in a call frame: which node are we processing, what comes next. The CPU calls it a frame. You call it a list entry. Same idea.

Same algorithm. The recursive version outsources memory management to the OS. The iterative version does it manually on the heap. The OS's version has a 8 MB limit. The heap's version is "how much RAM do you have."
The practical difference is budget. Your stack list lives on the heap and can grow as large as available memory. The CPU's call stack is capped at 8 MB. Converting recursion to iteration means trading the OS-managed call stack for a heap-allocated one: same asymptotic space, much higher ceiling. The iterative preorder traversal walkthrough shows this substitution in concrete code for trees.
When the Stack Runs Out
Push enough frames and rsp walks off the end of the stack region. On Linux, the kernel catches this as a page fault and delivers SIGSEGV. The program crashes.

The classic. The image is itself a stack overflow. A rare case where the meme format is also the bug.
Python adds a software guard before the OS gets involved. The interpreter raises RecursionError at 1,000 frames by default, well before the kernel's wall. The choice of 1,000 is not principled science. It is a reasonable default that stops beginners from crashing their REPL while still being low enough that you notice the problem.
import sys sys.getrecursionlimit() # 1000 sys.setrecursionlimit(5000) # push the software limit out
Raising the limit does not eliminate the ceiling. It moves it. A 500,000-frame deep call on Python will still crash. The OS limit doesn't care what setrecursionlimit says.
Some compilers eliminate this problem entirely for a specific class of recursion. If a function's last act is a plain return recursive_call(...), the compiler can reuse the current frame instead of pushing a new one. The stack stays constant depth. This is tail call optimization, and GCC and Clang do it automatically at -O2. Python deliberately skips it. Guido van Rossum's reasoning: TCO erases the stack frames that make tracebacks readable, and confusing tracebacks are a worse trade than stack efficiency for most Python programs. Reasonable position. Painful if you wrote a tight tail-recursive accumulator and just discovered it on prod.
Recap
- Every function call pushes a stack frame containing: the return address, the saved frame pointer, local variables, and callee-saved registers.
- The
callinstruction pushes the return address and jumps;retpops it and jumps back. - Three prologue instructions set up the frame: save
rbp, pointrbpat currentrsp, subtract N fromrspto allocate locals. The epilogue (leave; ret) tears it down. - Recursive depth equals frame count, which equals space cost.
- Iterative DFS with an explicit stack is the same algorithm with a bigger budget.
- Stack overflow happens when
rspwalks off the stack region. Python raisesRecursionErrorat 1,000 frames as a software guard before this. - Tail call optimization reuses the current frame for the last recursive call, keeping stack depth constant. GCC and Clang do it; Python deliberately does not.
If you want practice explaining this kind of mechanism out loud under pressure, SpaceComplexity runs voice-based mock interviews with rubric-based feedback on exactly these follow-up questions: "walk me through what happens in memory when this function is called."