What Are Garbage Collection Pauses? The Latency Tax on Your Runtime

- Stop-the-world (STW) pauses are unavoidable during object compaction: partial reference updates would cause crashes or silent data corruption
- Minor GC collects the young generation in 5–50 ms; a full GC on a large heap can pause for 200 ms to several seconds
- Allocation rate is the primary lever: more short-lived objects means more frequent minor GCs and faster promotion to old generation, which triggers expensive full GC sooner
- ZGC and Shenandoah deliver sub-millisecond garbage collection pauses on 16 GB+ heaps by doing compaction and reference updates concurrently with the application
- Python's reference counting frees most objects immediately with no pause; the cyclic GC only runs for circular references and can be disabled in latency-sensitive paths with
gc.disable() - In system design interviews, name the GC algorithm and its p99 pause characteristics when discussing latency guarantees
You're running a production service. Latency is steady at 20 milliseconds. P99 looks healthy. On-call hasn't paged anyone in weeks. Life is good.
Then, for no obvious reason, a few hundred requests suddenly take 500 milliseconds. The spike passes. Normal latency resumes. No errors in the logs. Nothing in the traces. Your incident timeline has a suspicious five-word entry: "it fixed itself, I think."
That's a garbage collection pause. Your program did nothing wrong. Your infrastructure did nothing wrong. The runtime decided to pause every thread in your process, clean up some memory, and quietly resume. It does this whether you asked for it or not.
Why Your Runtime Sometimes Just... Stops
The garbage collector's job is to find objects that are no longer reachable and reclaim their memory. Conceptually simple. In practice, there's a moment when it simply cannot do this in the background.
The core problem is that moving an object in memory requires updating every reference to it, and that update cannot be partial.
Picture your GC compacting memory. Object A lives at address 0x100 and needs to move to 0x200. Your application has five threads, each holding a reference to A in a local variable. If the GC starts moving A while your threads keep running, a thread might read the old address 0x100 during the update. That address now points to garbage, or to a different object entirely. The result is a crash or, worse, silent data corruption.
The only safe solution is to pause all threads first, move the object, update every reference, then resume. This "stop the world" (STW) phase is not a design flaw. It is the only way to guarantee memory safety without adding a lock to every single pointer dereference in your program.

Every thread stops. Latency tanks. Then everything resumes like nothing happened.
Minor GC vs. Major GC: Not the Same Bill
Not all pauses are equal. The JVM (and most generational collectors) splits the heap into two regions with very different cost structures.
Young generation (minor GC): Typically 256 MB to 1 GB. Collected frequently. Typical pause: 5 to 50 milliseconds. Fast because the region is small and most objects in it are already dead when the collector arrives.
Old generation (major/full GC): Several gigabytes. Collected rarely. Typical pause: 200 milliseconds to several seconds. Slow because the heap is large, most objects in it survived for a reason (so less gets reclaimed), and compaction has to reorganize a lot of memory.
Here are rough numbers for a JVM process with an 8 GB heap under normal load:
| Metric | Minor GC | Full GC |
|---|---|---|
| Pause duration | 10-30 ms | 500-1500 ms |
| Frequency | Every 5-30 sec | Every 2-4 hours |
| What triggers it | Young gen full | Old gen full or System.gc() |
The minor GC is background noise you learn to live with. The full GC is the one that wakes you up at 3 AM wondering why half your traffic spiked for a second and a half.
The Generational Hypothesis: Most Objects Die Young
Why split the heap at all? Because of an empirical pattern that holds across virtually every real-world program: the vast majority of objects become unreachable within microseconds to milliseconds of being allocated.
In a typical Java or Python program, roughly 90-95% of all allocated objects never survive to a second GC cycle.
Think about what happens in a web server. A request arrives, you create a few Request objects, parse some JSON into temporary maps, build a response, and send it back. The entire object graph from that request becomes garbage the moment the response is written. Those objects had a shorter life than a mayfly at a hailstorm.
Only the long-lived things (database connection pools, caches, static configuration) end up in the old generation. Everything else burns bright and vanishes.
Generational GC exploits this by keeping the young generation small and collecting it often. Most garbage is already dead when the collector gets there. Efficient graveyard maintenance, essentially.
Your Allocation Rate Is Probably Your Own Fault
You can't eliminate GC pauses, but you can absolutely influence how often they happen and how long they last.
The primary lever is allocation rate: how many objects you create per second.
A service allocating 100 MB/sec will trigger minor GC every 5-10 seconds. A service allocating 1 GB/sec is essentially in continuous GC. High allocation doesn't just make pauses more frequent. It also promotes more objects to the old generation (some of those objects are still alive during collection), which fills old gen faster and triggers expensive full GC sooner.
Code patterns that create unnecessary allocation pressure:
# Every iteration creates a new string in memory result = "" for word in words: result += word # O(n²) allocations, each concatenation creates a new object # Better: allocate once at the end result = "".join(words)
// Creates a new String[] on every call String[] tags = Arrays.asList("a", "b", "c").toArray(new String[0]); // Better: extract to a constant private static final String[] TAGS = {"a", "b", "c"};
Long-lived references are a separate problem. If you cache objects forever without eviction, you prevent the GC from reclaiming them. The object survives to old generation, accumulates there, and eventually a full GC has to scan a much larger heap than necessary. Unbounded caches are one of the most common causes of premature full GC in production Java services. They look like performance wins right up until they become 1,200ms pauses at peak traffic.
Modern GC: Sub-Millisecond Is a Real Thing Now
The JVM's default collector has evolved significantly. Java 9's G1GC targets 200 ms pause times by default, and you can tune that lower. Java 17+ ships with ZGC, which consistently achieves sub-millisecond pauses on heaps of 16 GB or more. Shenandoah achieves similar results with a different implementation.
ZGC works by doing almost everything concurrently with the application. The brief STW phases cover only safepoint initialization and a tiny amount of root scanning, typically under 1 millisecond even on multi-GB heaps. Compaction and reference updating happen concurrently, using write barriers to track any mutations the application makes while the GC is in flight.
Go's garbage collector takes a similar approach. Designed for sub-millisecond pauses from the ground up, with frequent short concurrent cycles rather than infrequent long stops.
The tradeoff is overhead. Write barriers add roughly 10-20% overhead to pointer assignments. If you never hit GC pause problems, a simpler collector with occasional stops may actually be faster overall. ZGC is the right choice when your SLA cares about p99 latency. Parallel GC may be faster in throughput-optimized batch jobs. Pick your poison based on what actually matters in your system.
Python Is Different (But Not Off the Hook)
Python's primary memory management is reference counting, not tracing GC. When an object's reference count reaches zero, it is freed immediately. No pause, no background thread, no batch collection.
Reference counting has no latency impact at all. The cost of freeing an object is paid incrementally, one assignment at a time.
The catch is circular references. If object A holds a reference to B and B holds a reference to A, their counts never reach zero even if nothing else can reach them. Python ships with a cyclic garbage collector to handle exactly this case. It scans for cycles periodically, and when it does, it pauses.
import gc # Disable automatic cyclic GC in latency-critical paths gc.disable() # ... do your latency-sensitive work ... # Manually collect during a low-traffic window gc.collect() gc.enable()
This is a real pattern in production Python services. Disabling automatic cycles avoids unpredictable pauses. The tradeoff is that circular references accumulate until you manually collect. If your code creates circular structures (certain class hierarchies, doubly-linked lists with back-references), leaving gc disabled for too long causes memory growth. There is no free lunch. There never is.
Why This Comes Up in Coding Interviews
GC pauses show up in interviews in two distinct ways.
The first is language gotchas. If you write a Java solution that calls System.gc() explicitly, an interviewer will notice. System.gc() is a hint to the JVM, not a command. The JVM can ignore it. Calling it in application code signals that you're trying to micromanage something the runtime handles better than you do, which is roughly the impression of someone telling a chef how to hold a knife. Similarly in Python, creating massive temporary data structures inside a loop and relying on GC to clean up works, but accumulates allocation pressure in ways that compound.
The second context is system design. If you're designing a latency-sensitive service, you need to account for GC in your p99 latency estimates. A Java service using the default GC with a 4 GB heap will periodically see 200-500 ms pauses. If your SLA is 100 ms p99, that's a design flaw, not a tuning problem. The solution is either a low-pause GC (ZGC, Shenandoah), reducing allocation rate, or using a different runtime for that component.
The interview-relevant checklist:
- Short-lived objects in hot paths increase GC frequency. Use object pools or reuse buffers where possible.
- Long-lived caches fill old generation fast. Bound them.
System.gc()in Java is almost never appropriate in application code.- Python circular references require explicit
gc.collect()or careful design to avoid. - When discussing latency guarantees in system design, name the GC algorithm your runtime uses and its pause characteristics.
If you want to practice explaining memory management and GC trade-offs out loud, under actual interview conditions, SpaceComplexity runs voice-based mock interviews with rubric-based feedback that covers exactly these kinds of design reasoning questions.
Related Reading
- What Is Garbage Collection? explains the broader mechanics of tracing collection, including mark-sweep and mark-compact algorithms.
- How Garbage Collection Works goes deeper on allocation patterns and the write barrier overhead that concurrent GC requires.
- What Is Python Reference Counting? covers CPython's primary memory mechanism and why it avoids most GC pauses.
- What Is a Memory Leak? explains what happens when references accumulate faster than GC can reclaim them.
Further Reading
- Java ZGC Documentation, Oracle's official ZGC tuning guide with pause time guarantees and configuration options.
- Go GC Guide, The official Go garbage collection guide explaining the concurrent collector design and the GOGC/GOMEMLIMIT levers.
- Python gc Module Documentation, CPython's cyclic garbage collector API, including when to call
gc.collect()and whatgc.disable()actually does. - Getting to know the G1 Garbage Collector, Oracle's deep dive on G1GC region-based collection and pause target tuning.
- A Unified Theory of Garbage Collection (Bacon et al., OOPSLA 2004), The academic paper showing that tracing GC and reference counting are duals of each other, useful background for understanding why both approaches exist.