What Is a Free List? The Memory Trick Behind Fast Allocators

Every time your program calls malloc or new, there's a chance it knocks on the operating system's door. That round trip costs microseconds. Not milliseconds. Microseconds. And if you're doing it thousands of times a second on a hot path, those microseconds add up into something you'll notice in a flame graph at 2am.

A free list is a linked list of already-allocated memory blocks sitting idle, waiting to be handed back out without asking the OS for anything. When you free a block, it goes onto the list instead of back to the kernel. When you need a block, you pop the list instead of calling the allocator. Both operations are O(1) and entirely in userspace.

This is not some obscure academic trick. It runs inside CPython, the Linux kernel, the JVM, every serious game engine, and tcmalloc, the allocator behind much of Google's production infrastructure. You've been benefiting from it for years without knowing.

Why `malloc` Is Slower Than You Think

The general-purpose allocator (malloc on Linux, the Windows heap) has to handle every possible size and lifetime pattern simultaneously. That forces it into a lot of work: splitting large blocks to satisfy small requests, merging adjacent free blocks to prevent fragmentation, maintaining metadata headers next to every allocation.

Two problems hit hard when you allocate many small objects of the same size rapidly.

System call overhead. If the allocator runs out of room in its internal pools, it calls brk or mmap to get more from the OS. Those are expensive. Even when it services the request from its own pool, it still does bookkeeping that takes hundreds of nanoseconds per call. Think of it like calling your landlord every time you want a glass of water. Technically works. Not what you want on a hot path.

External fragmentation. A general allocator that handles mixed sizes eventually develops holes: gaps of free memory too small for any pending request. Over a long-running server, fragmentation quietly wastes gigabytes. Your process RSS climbs. Nothing you can do about it.

A free list sidesteps both. For fixed-size allocations, there is no fragmentation, no merging, and no metadata overhead. Every block is the same size. Allocation is one pointer write. That's it.

The Embedded Pointer Trick (This Is the Good Part)

A free block is, by definition, not in use. Its data region contains nothing your program cares about. So instead of keeping a separate structure somewhere to track which blocks are available, you store the pointer to the next free block inside that data region itself.

No extra memory. No separate array of free-block indices. The pointer lives where the data would live, and disappears the moment the block gets allocated. It's beautifully hacky in the way that only low-level systems code can be.

Visually, a three-block pool looks like this before any allocation:

Memory pool: [ ptr→B | ... ] [ ptr→C | ... ] [ NULL | ... ]
              Block A          Block B          Block C
                ↑
              head

Allocate: pop the head. Block A is returned to the caller, head moves to B.

[ in use     ] [ ptr→C | ... ] [ NULL | ... ]
  Block A        Block B          Block C
                   ↑
                 head

Free Block A: push it back. Block A's first word becomes a pointer to the current head.

[ ptr→B | ... ] [ ptr→C | ... ] [ NULL | ... ]
  Block A          Block B          Block C
    ↑
  head

Two pointer writes per operation, total. Nothing else. The pointer that held your data a moment ago is now plumbing.

Implementing One in C

The C implementation is the clearest, because C doesn't hide anything:

typedef struct Block {
    struct Block *next;
} Block;

typedef struct {
    Block *head;
} FreeList;

void free_list_init(FreeList *fl, void *pool, size_t block_size, int count) {
    fl->head = NULL;
    char *p = (char *)pool;
    for (int i = count - 1; i >= 0; i--) {
        Block *b = (Block *)(p + (size_t)i * block_size);
        b->next = fl->head;
        fl->head = b;
    }
}

void *free_list_alloc(FreeList *fl) {
    if (!fl->head) return NULL;
    Block *b = fl->head;
    fl->head = b->next;
    return b;
}

void free_list_free(FreeList *fl, void *ptr) {
    Block *b = (Block *)ptr;
    b->next = fl->head;
    fl->head = b;
}

Block size must be at least sizeof(void*), 8 bytes on a 64-bit machine. When allocated, the caller overwrites that first word with whatever it needs. The next-pointer just... ceases to exist. Until the block is freed again, at which point it comes back to life.

A Python Object Pool

Python's pymalloc uses exactly this strategy internally for objects smaller than 512 bytes. At a higher level, you can express the same idea as an object pool and get most of the benefit without touching C:

class NodePool:
    def __init__(self, size=256):
        self._free = [{"val": None, "next": None} for _ in range(size)]

    def acquire(self):
        if self._free:
            node = self._free.pop()
            node["val"] = None
            node["next"] = None
            return node
        return {"val": None, "next": None}

    def release(self, node):
        self._free.append(node)

Python lists are not singly linked lists under the hood, but the semantics are identical: pop to allocate, append to free, O(1) amortized for both. The pool absorbs the GC pressure on hot paths where you'd otherwise be creating and destroying hundreds of nodes per second, each one tickling the garbage collector.

How Fast Is It, Really?

Operation	Time
Allocate	O(1)
Free	O(1)
Initialize	O(n)

Space overhead is zero at runtime. The next-pointer occupies the block's own data region. A 64-byte block gives you 64 bytes of usable storage when allocated. A pool of 1,000 such blocks uses exactly 64,000 bytes, nothing hidden.

The constraints are real though:

Block size is fixed. A 64-byte free list wastes 60 bytes when you need only 4, and fails completely when you need 128.
Pool size is fixed. When you exhaust the pool, you either return null or fall back to the general allocator.
Thread safety requires coordination. A shared free list needs a lock or compare-and-swap on the head pointer. Per-thread free lists avoid this entirely.

Production allocators solve the size constraint with size classes: a separate free list for each size range (8B, 16B, 32B, 64B, ...). tcmalloc uses 86 size classes. Each allocation goes to the smallest class that fits, keeping internal fragmentation low while preserving O(1) allocation across the full small-object range.

Where Free Lists Actually Run

CPython. Python's pymalloc maintains a collection of pools, each subdivided into fixed-size blocks. Free blocks in each pool are linked together. Object creation below 512 bytes almost never touches the OS allocator. This is why creating millions of small Python objects is faster than you'd expect from a language that does garbage collection on everything. CPython is sandbagging.

Linux kernel slab allocator. The kernel allocates common objects like task_struct, inode, and sk_buff from slab caches. Each cache is a collection of pages divided into fixed-size objects with a free list. A new process creation pops a pre-initialized task_struct off the slab rather than zeroing a raw page.

Java JVM. Young-generation allocations in HotSpot go into a thread-local allocation buffer (TLAB): a chunk of heap pre-assigned to one thread. Allocations within a TLAB are a pointer bump, with free list semantics at the TLAB level when TLABs are recycled. The JVM is essentially running a free list at every level and then telling you GC is magic.

tcmalloc and jemalloc. Google's tcmalloc (used in Chrome, Go's runtime, and large chunks of Google's production infrastructure) keeps per-thread free lists for small objects. The common-case allocation path touches no locks. Jemalloc (used in Firefox, FreeBSD's libc, and Meta's systems) uses a similar per-thread arena model. Lock-free, fast, everywhere.

Game engines. Real-time rendering budgets 16ms per frame. A garbage collection pause or an unexpected malloc latency spike shows up immediately as a dropped frame, and your players notice before you do. Game engines use custom allocators built on free lists for particles, audio events, render commands, and anything that appears and disappears at high frequency. new is basically banned in hot paths.

Why This Comes Up in Interviews

Free lists rarely appear as the direct problem statement. They show up as the right answer to a deeper question.

LRU cache. The standard LRU cache implementation is a hash map plus a doubly linked list. In a production implementation you'd maintain a free list of node objects and reuse them across put operations rather than allocating and garbage-collecting on every eviction. Mentioning this signals you think about allocation cost, not just algorithmic complexity.

Memory allocator design. System design interviews at infrastructure companies sometimes ask you to design a fixed-size object allocator or a slab cache. A free list is the core mechanism: pre-allocate a slab, link the blocks, O(1) allocate and free. Layer per-thread lists on top for lock-free access.

Garbage collection discussions. Understanding that garbage collectors use free lists internally sharpens your answers about managed language performance. When an interviewer asks why GC pauses happen, you can explain that the GC must scan live objects and reconstruct free lists, not just flip a flag. That answer comes from somewhere.

"Why is this slow?" If an interviewer walks you through a system where object creation is on the hot path and asks what's wrong, knowing that each malloc is a potential OS call, and that free lists avoid it entirely, is the kind of answer that lands.

Memory leaks and dangling pointers often trace back to manual free list management going wrong. Double-free corrupts the list structure itself. Use-after-free reads a block that was reallocated to something else entirely. Knowing the failure modes shows you've thought about the full lifecycle, not just the happy path.

The pattern is everywhere once you see it. The object pool, the connection pool in database drivers, the thread pool in a web server: these are all free lists with different names. Pre-allocate a fixed set of resources. Lend them out. Take them back. Never touch the OS for the common case.

If you want to practice explaining these ideas out loud under actual interview pressure, SpaceComplexity runs voice-based mock interviews that cover exactly this kind of systems knowledge, with rubric-based feedback on whether your reasoning is landing.