Python Reference Cycles: The Memory Leak Two Garbage Collectors Exist to Catch

- Reference counting frees objects immediately when their count hits zero, but two objects pointing at each other keep each other's count above zero indefinitely.
- Python's cyclic GC is a separate, generational mark-and-sweep collector that identifies unreachable islands of container objects that reference counting can't reclaim.
- Weak references break cycles without preventing cleanup:
weakref.ref()in Python,Weak<T>in Rust,weak_ptrin C++, andweak varin Swift. - Backpointers create cycles: doubly linked lists, trees with
parentpointers, and any object that holds a reference back to its container all hit this problem. - Java, Go, and C# use tracing GCs exclusively, so cycles are handled automatically at the cost of non-deterministic, occasionally stop-the-world collection.
- PEP 442 (Python 3.4) fixed safe finalization ordering so
__del__objects inside cycles are no longer permanently leaked. - The
gcmodule providesgc.collect(),gc.get_count(), andgc.get_referrers()for diagnosing cycle leaks in production Python.
Reference counting is elegant. Each object carries a counter: the number of live references pointing at it. When that counter drops to zero, the memory is freed immediately. No background thread, no pause, no guesswork.
Until two objects point at each other.
If A holds a reference to B, and B holds a reference to A, neither counter ever reaches zero. Both objects stay alive indefinitely, even if the rest of your program has completely forgotten they exist. The memory just sits there, occupied, forever, like a houseguest who technically never received a checkout notice.
That's a reference cycle, and it's the oldest known weakness of reference counting.
Fast, Deterministic, and Blind to Cycles
Every object in a reference-counted runtime has an integer field: its reference count. The count starts at 1 when the object is created, increments each time a new reference is assigned, and decrements when a reference goes away. When the count hits zero, the runtime frees the object immediately.
In CPython, that field is ob_refcnt on every PyObject. You can inspect it directly:
import sys x = [] print(sys.getrefcount(x)) # 2: one for x, one for the getrefcount argument y = x print(sys.getrefcount(x)) # 3 del y print(sys.getrefcount(x)) # 2
The same mechanism drives Swift's ARC, Rust's Rc<T>, C++'s std::shared_ptr, and PHP's runtime. Fast, deterministic, and correct for the 99% case.
The other 1% is cycles. Naturally.
Two Objects Walk Into a Cycle
class Node: def __init__(self, val): self.val = val self.next = None a = Node(1) b = Node(2) a.next = b # b's refcount: 2 (b + a.next) b.next = a # a's refcount: 2 (a + b.next) del a # a's refcount drops to 1 (still held by b.next) del b # b's refcount drops to 1 (still held by a.next) # Both objects are unreachable. Both have refcount 1. Neither is freed.
After del a and del b, no live variable in your program can reach either node. They are, for all practical purposes, dead. Useless. Gone from your mental model.
And yet they are not freed. CPython looks at each one, sees a refcount of 1, concludes someone must still care, and leaves them alone. They will sit in memory until the process exits. Like two servers in a data center that nobody decommissioned because everyone assumes someone else is using them.
This is not a corner case. Any data structure with backpointers creates cycles, and backpointers are everywhere.
Where Python Reference Cycles Actually Show Up
Doubly linked lists
Every node holds prev and next. That's two cycles per node in a list with at least two elements. The LRU cache implementation you'll write in interviews uses exactly this structure. Every node you create is a small cycle. In CPython, memory reclaim relies on the cyclic GC, not on reference counting.
Trees with parent pointers
class TreeNode: def __init__(self, val): self.val = val self.children = [] self.parent = None # the culprit root = TreeNode("root") child = TreeNode("child") child.parent = root # child -> root root.children.append(child) # root -> child # cycle exists
Every node in this tree participates in a cycle with its parent. This pattern comes up constantly in interview problems (path-to-root, LCA), so it's worth knowing what happens to the memory when you're done with the tree and just expect it to vanish.
Anything that owns its owner
A class that stores a reference to the collection it lives in. An event handler that captures self. A closure over a variable that holds the closure. These are everywhere in real code and most engineers only discover they're leaking memory after a weekend on-call shift.
How Python Handles It: Two GCs for the Price of One
CPython ships a cyclic garbage collector as a secondary pass, entirely separate from reference counting. Reference counting handles the fast path. The cyclic GC handles the residue. This is not the cleanest architecture. It's the necessary one.
The cyclic GC is generational. Objects are sorted into three generations (0, 1, 2) by survival count. New container objects start in generation 0 and are collected frequently. Survivors get promoted to generation 1, then 2, and collected much less often. Long-lived objects are less likely to be garbage, so you don't waste time rescanning them every cycle.
import gc gc.get_threshold() # (700, 10, 10): generation 0 runs every 700 allocs gc.get_count() # current counts per generation gc.collect() # trigger a manual pass, returns number of unreachable objects gc.collect(0) # collect only generation 0
The detection algorithm is a simplified mark-and-sweep applied only to container objects (lists, dicts, sets, class instances). It identifies "islands": groups of objects whose only remaining references are internal to the group. Those get freed.
You can turn the cyclic GC off with gc.disable() if you guarantee your code creates no cycles. Some performance-sensitive applications do this. The documentation does not elaborate on what happens if you're wrong.
The Fix: Break the Cycle Before It Forms
A weak reference points at an object without incrementing its reference count. When the referent's count drops to zero, the object is freed and the weak reference becomes None. The cycle is broken because one direction of the loop stopped counting.
import weakref class TreeNode: def __init__(self, val): self.val = val self.children = [] self._parent = None @property def parent(self): return self._parent() if self._parent is not None else None @parent.setter def parent(self, node): self._parent = weakref.ref(node) if node is not None else None
Weak references are the idiomatic fix for backpointers. Child-to-parent, observer-to-subject, cache-entry-to-original: wherever ownership is clear and the back direction is navigation only, use a weak reference. The parent owns the child. The child does not own the parent. Make the reference count reflect that.
The pattern is identical across reference-counted languages:
// Rust: Rc<T> is strong, Weak<T> is the non-owning version use std::rc::{Rc, Weak}; use std::cell::RefCell; struct Node { val: i32, parent: Option<Weak<RefCell<Node>>>, // weak: doesn't hold parent alive children: Vec<Rc<RefCell<Node>>>, // strong: children are owned } // Downgrade to create a weak ref; upgrade to try to access it let parent_weak = Rc::downgrade(&parent_rc); if let Some(p) = parent_weak.upgrade() { println!("parent still alive: {}", p.borrow().val); }
// C++: shared_ptr is strong, weak_ptr is the non-owning version #include <memory> struct Node { int val; std::weak_ptr<Node> parent; // doesn't extend lifetime std::vector<std::shared_ptr<Node>> children; // owns the children }; if (auto p = node->parent.lock()) { std::cout << p->val << std::endl; }
Swift's ARC works the same way. strong (the default) increments the count. weak does not.
class TreeNode { var val: Int var children: [TreeNode] = [] weak var parent: TreeNode? // 'weak' breaks the cycle init(_ val: Int) { self.val = val } }
What About Java, Go, and C#?
These languages use tracing garbage collectors exclusively. A tracing GC starts from roots (stack variables, globals) and walks every reachable reference. Objects it never visits get freed. Cycles are handled automatically: if two unreachable objects point at each other, neither gets visited, both get freed together.
Java developers do not think about reference cycles. At all. They go home at 5pm.
The tradeoff is non-deterministic collection and occasional stop-the-world pauses. CPython's reference counting is deterministic and immediate for the common case. Both are real tradeoffs. Pick your poison.
Understanding how garbage collection works makes the choice clearer: reference counting is O(1) per allocation but can't handle cycles alone, while tracing GC handles everything but pauses occasionally.
What the Leak Actually Costs
For the common case, reference counting overhead is O(1) per reference mutation. Fast.
Python's cyclic GC runs in O(n) time relative to tracked container objects, but generational design amortizes this well. Generation 0 runs frequently on a small, young set. Generations 1 and 2 are rarely touched. GC pauses in CPython are usually under a few milliseconds for typical workloads.
The key mental model: leaked objects from a cycle occupy exactly the same memory as they would if they were live. No compression, no sharing. A leaked graph with 10,000 nodes uses the same memory as a live graph with 10,000 nodes. Call gc.collect() explicitly if you need deterministic cleanup after building large cyclic structures.
The __del__ Footgun (Now Defused)
In Python 2, objects with a __del__ method were exempt from the cyclic GC. If such an object was part of a cycle, it leaked forever, because the GC didn't know in what order to call the destructors. Python 2's answer to this problem was essentially: your problem.
Python 3.4 fixed this with PEP 442, which defined safe finalization ordering. Objects with __del__ are now handled correctly by the cyclic GC. If you're in Python 3, this is solved. If you're reading old documentation that mentions gc.garbage, that's what it was tracking: a list of cycle members that couldn't be freed because of unrunnable destructors.
Why This Matters in Coding Interviews
Three places this comes up.
Designing data structures with backpointers. If you're asked to implement an LRU cache in Python, you'll build a doubly linked list with prev and next on every node. That's cycles everywhere. Mentioning that Python's cyclic GC handles this, or that you'd use weak references in a performance-sensitive setting, shows you think about memory rather than just about passing the test case.
System design questions about memory management. If you're asked why Python's GC has two phases, or why Swift requires the weak keyword to break cycles in view controllers, the answer is the same story: reference counting is fast but can't handle cycles, so the runtime adds a second mechanism on top.
Debugging memory leaks in Python. The gc module lets you inspect what's living in each generation, find referrers, and force collection. In a production debugging scenario, gc.get_referrers(obj) tells you exactly what's keeping an object alive. That's the kind of answer that ends an on-call incident.
Reference counting and garbage collection are not opposites. CPython uses both. Most modern runtimes use both in some form. Knowing when each applies, and what each handles well, is the kind of depth that stands out in a senior engineering interview.
If you want to practice explaining memory management under pressure, SpaceComplexity runs voice-based mock interviews with real-time feedback on how clearly you're communicating.