How Garbage Collection Works: The Pause Lives in Your Allocation Pattern

Your service had a latency spike. The flame graph blamed the garbage collector. Nothing in your code changed. Your on-call partner blamed the infrastructure. Infrastructure blamed the JVM. The JVM did not apologize. But the GC does not run arbitrarily. Your allocation pattern triggered it. Understanding why pauses happen, and which allocation choices cause them, is what separates engineers who tune their way out of GC trouble from engineers who shrug and bump the heap size.

From mark-and-sweep through the generational hypothesis to what Go's tricolor GC and Java's ZGC do differently, this is how garbage collection actually works, and how to write code that keeps the collector quiet.

The Problem Garbage Collection Is Solving

In languages without a garbage collector, you call malloc to allocate and free to release. Get the pairing wrong and you get a use-after-free bug, a double-free crash, or a memory leak that slowly starves your server. These bugs are subtle and painful.

A garbage collector removes that burden: you allocate freely, and the runtime figures out what is no longer reachable and reclaims it. The trade-off is giving up control over when memory is freed. That "when" is where pauses come from.

Language-by-language memory management meme: C is "you", JS allocates so much by default leaks are undetectable, Java leaks on 3 Billion Devices, Go collects your garbage, Haskell uses no memory until you look, Rust had a function steal your string Every language has a different answer to "who handles memory." This post is about how the "collects your garbage for you" camp actually does it.

Reference Counting: The Obvious Idea That Hits a Wall

The simplest approach is reference counting. Every object carries a counter. Each time you create a reference to the object, the counter goes up. When a reference is dropped, it goes down. When the counter hits zero, free the object immediately.

CPython does this. Swift's ARC does this. It has a real advantage: objects are freed the moment they become unreachable, so memory is predictable and deterministic.

The wall it hits is cycles. Not metaphorical cycles. Actual circular references that count themselves into permanent existence:

a = Node()
b = Node()
a.next = b
b.prev = a  # cycle
del a
del b       # both refcounts → 1, not 0; neither is freed

Both objects have a reference count of 1 (pointing at each other), so neither gets freed. They're both dead, they both know it, and neither will let go first. CPython handles this with a separate cycle detector, a generational mark-and-sweep that runs on top of refcounting specifically to find and break cycles. Most pure reference-counting runtimes need some version of this fallback.

Mark-and-Sweep: Trace the Live Graph

The alternative is tracing garbage collection. Instead of tracking counts, you periodically find everything that is reachable from your program and free everything else.

Mark-and-sweep has two phases:

Mark. Start from the roots (stack frames, global variables, CPU registers) and traverse the entire live object graph, marking every reachable object. This is a DFS or BFS over the heap.
Sweep. Walk the entire heap. Any object not marked is unreachable garbage. Free it.

GC object graph showing root set on the left connecting to blue marked live objects, with unreachable dashed objects disconnected on the right Roots (stack, globals, registers) can reach some objects and not others. Unreachable objects are collected regardless of their reference counts. Cycles are no obstacle.

Cycles are no problem here because the algorithm does not care about reference counts. If no root can reach an object, even through a cycle, it is garbage.

The naive version stops all application threads during both phases. That is a stop-the-world (STW) pause. For a large heap, marking alone can take hundreds of milliseconds. Your users notice. They file tickets. The tickets say "app froze." The tickets are not wrong. That is not acceptable for interactive software or latency-sensitive services.

The Tricolor Invariant: How Marking Runs Concurrently

The tricolor abstraction, introduced by Dijkstra and Lamport in 1978, is how modern collectors run the mark phase concurrently with your program.

Each object gets one of three colors:

White: not yet seen; a candidate for collection if it stays white until marking ends.
Grey: known to be reachable, but its outgoing references have not been scanned yet.
Black: fully scanned; all its children are grey or black.

The collector starts by greying the roots and then repeatedly picks a grey object, scans its children (greying the white ones), and colors the object black. When no grey objects remain, every white object is unreachable garbage.

Tricolor state machine showing white, grey, and black states with labeled transitions and the write barrier note Objects move one way through the state machine: white to grey when discovered, grey to black when fully scanned. The write barrier prevents the runtime from skipping any object.

The key invariant: no black object may directly reference a white object. If a black object points to a white one, the white one might get freed even though it is reachable. The invariant guarantees that cannot happen.

The problem is your code runs while the collector marks. You might store a pointer from a black object to a white one, violating the invariant before the collector notices. The fix is a write barrier: a small hook that fires on every pointer store. Whenever your code writes a pointer, the write barrier checks whether the invariant is about to break and greys the referent if necessary.

Go uses a hybrid barrier (Dijkstra insertion barrier + Yuasa snapshot barrier) that greys both the old referent being overwritten and the new one being written. This keeps the invariant intact without a STW remark phase at the end.

Why Brief Stop-the-World Pauses Still Happen

Even the most concurrent collectors need at least short STW pauses. Go's GC has two:

Mark setup. The runtime must enable the write barrier and establish the initial root set atomically. This pause is typically under 1 millisecond.
Mark termination. Finalize the mark phase, disable the write barrier, begin sweeping.

Between those two pauses, the collector marks concurrently with your goroutines. Java's ZGC goes further: it performs root scanning and even relocation concurrently, keeping STW pauses under one millisecond even on terabyte heaps. Shenandoah takes a similar approach. These are not magic; they require more sophisticated barriers and more implementation complexity, but the payoff is pauses that are effectively invisible to users.

The Generational Hypothesis: Most Objects Die Young

Across languages and workloads, empirical measurement shows the same thing: the vast majority of objects die very shortly after allocation.

A web server handling a request allocates a request struct, a response builder, a few parsed values, some temporary strings. When the request finishes, 90%+ of those objects are dead. They lived fast, died young, and left no forwarding address. A handful of objects, a database connection pool, a config struct, a global cache, live for the lifetime of the process.

Generational collectors exploit this. The heap is split into a young generation (where new objects go) and an old generation (where survivors are promoted after lasting several collections).

Generational heap layout showing Eden space, Survivor spaces S0 and S1 in the young generation, and the old generation (Tenured) with a promotion arrow pointing from survivors to old gen Young gen collections are frequent and cheap because the region is small and mostly garbage. Old gen collections are rare and expensive. The same logic explains why objects should die before they get promoted.

Young generation collections are frequent and cheap: the region is small and mostly full of garbage, so the collector has little live data to trace. Old generation collections (major GCs) are rare and expensive. The same logic explains why hash maps with large object-value graphs are GC-intensive: the collector must trace every value reachable from every bucket on every scan.

Java's G1 collector goes further: it divides the heap into equal-sized regions (1 to 32 MB each) tagged as eden, survivor, or old, and builds a collection set prioritizing the regions with the most garbage. "Garbage-first" is where the name comes from.

Bump Pointer Allocation: Why Allocation Is Free (Until It Isn't)

Young generation collectors typically use a copying collector. The young generation is divided into two semi-spaces: from-space and to-space. New objects are allocated in from-space with bump-pointer allocation:

# conceptually
def allocate(size):
    ptr = bump_ptr
    bump_ptr += size
    return ptr

That is the entire allocator. Not "essentially." Literally. No free-list traversal, no size-class lookup, no fragmentation handling. The internet has spent thirty years arguing about memory safety and the young-gen allocator is ptr += size. Allocation in a copying collector is approximately as fast as incrementing a pointer. This is often faster than malloc for the same reason.

Bump pointer allocation diagram showing four objects packed left-to-right in from-space memory with the bump_ptr arrow pointing to the next free slot Every allocation is one pointer increment. The free space sits to the right until it runs out. Then a minor GC fires and the roles of from-space and to-space swap.

When from-space fills up, a collection triggers. The collector traces the live object graph, copying each live object into to-space. Dead objects are simply abandoned and the entire from-space is reclaimed at once. Then the two spaces swap roles.

The catch: the collection cost is paid up front as a pause. Allocation looks free, but every byte you allocate is a byte that will eventually trigger a collection. The pause you see in production is the sum of all the allocation you did since the last collection.

How Your Code Drives Pause Frequency

The collector does not run on a timer. It runs when the heap grows past a threshold.

Go's GOGC=100 (the default) means: collect when the heap has grown by 100% since the last collection. Double the live heap size in one request burst, and the GC runs. High allocation rate means the heap grows faster and collections happen more often.

CPython's cycle detector uses a generational threshold: generation 0 triggers after 700 net allocations by default (gc.get_threshold() returns (700, 10, 10)). A tight loop creating small objects can hit that 700 limit faster than you would expect.

The most expensive pattern for generational collectors is the mid-life crisis: objects that survive long enough to be promoted to old generation but then die shortly after. The mid-life crisis. Named aptly. These objects did nothing wrong except arrive at exactly the wrong time. They accumulate in the old generation as garbage that can only be reclaimed in a major GC, which is expensive. You pay the promotion cost, you fill the old gen, and you pay again for the major collection.

This happens when you hold references longer than necessary. A common culprit: stashing request-scoped data in a long-lived map, then removing it later. The object survives the next minor GC, gets promoted, and the major GC pays for your late cleanup.

Write Code the Collector Can Handle

Allocation timeline showing two patterns: high allocation rate with frequent GC pauses (amber sawtooth), and low allocation rate with rare pauses (blue), both measured against a GC trigger threshold Allocation rate drives pause frequency. The collector does not run on a schedule. It runs when you give it enough garbage to care about.

The patterns that reduce GC pause frequency in practice:

Reduce allocation rate in hot paths. The fewer objects you create, the less the collector has to clean up. Reuse buffers. Go's sync.Pool is exactly for this: it holds objects that would otherwise be freed, returning them for reuse. .NET's ArrayPool<T> does the same. Java's ByteBuffer.allocateDirect for I/O-heavy code.

Prefer value types over pointer types where possible. In Go, a [1024]byte on the stack never sees the heap. In Java, int[] is a single heap object; ArrayList<Integer> is a heap object plus one heap object per element (each boxed integer). Hot-path code with boxed primitives is paying per-element allocation costs. This is the same reason a dynamic array outperforms a linked list in practice: contiguous memory means the GC has one object to trace, not N.

Do not hold cross-generational pointers longer than necessary. Every time an old-generation object points to a young-generation object, the write barrier must record that reference so minor GCs can trace it. These "remembered sets" are cheap per write, but a high volume of them adds up.

Tune thresholds based on your workload, but measure first. Most GC pauses are invisible. Enable GC logging before changing anything. Go's runtime/trace shows GC events on a timeline. JVM's -Xlog:gc* gives per-collection timing. Python's gc.set_debug(gc.DEBUG_STATS) prints collection statistics. Profile first, then tune GOGC or heap sizes. Tuning blind usually just trades one problem for another.

If you want to practice explaining this kind of reasoning out loud, which interviewers for systems roles increasingly expect, SpaceComplexity runs voice-based mock interviews that cover exactly this territory.

The Short Version

Reference counting frees objects immediately but cannot handle cycles without a secondary collector.
Mark-and-sweep traces the live object graph from roots; tracing garbage collection handles cycles natively.
The tricolor invariant lets marking happen concurrently with your program; write barriers maintain it.
Stop-the-world pauses are unavoidable for root scanning and phase transitions, but modern collectors shrink them to under a millisecond.
The generational hypothesis drives heap layout: young generation collections are cheap and frequent, old generation collections are expensive and rare.
Bump-pointer allocation is nearly free; the cost is deferred to collection time.
Your allocation rate, not the collector, determines how often pauses happen. Mid-life objects are the most expensive pattern.
Fix: allocate less in hot paths, reuse buffers, avoid holding references across generation boundaries longer than needed.