What Is a Memory Pool? The Allocator That's Faster Than malloc

You call malloc once. Fine. You call it two hundred thousand times per second, allocating small nodes for a parser, message objects for a broker, or request buffers in a web server. Now you're spending a meaningful fraction of your runtime watching the allocator hunt through fragmented free lists, juggle its own bookkeeping headers, and occasionally wake up the OS just to hand you a 32-byte object you're going to throw away in 10 milliseconds.

Instead of delegating every object to the general-purpose allocator, a memory pool grabs a large block of memory up front and carves it up itself. Allocation becomes pointer arithmetic. Freeing the whole batch is one call. Your objects sit contiguously in memory, so the hardware prefetcher actually gets to do its job. Production systems from nginx to Python's runtime to every game engine you've ever installed use this technique. They didn't stumble onto it by accident.

Two Designs, Two Different Problems

A memory pool (also called an arena or a region) is a pre-allocated contiguous block that you subdivide into smaller chunks. The two dominant designs handle different situations.

Bump-pointer (arena) allocators. You keep one pointer starting at the front of the block. Each allocation advances the pointer by the requested size. Freeing individual objects is not supported. You free the entire arena at once when you're done with everything in it. This is the right design when a batch of objects is created together and discarded together, like all allocations during a single HTTP request.

Fixed-size (object) pools. You divide the block into slots of equal size and maintain a free list of available slots. Allocating pops one slot off the head of the list. Freeing pushes the slot back. Individual deallocation is supported, but only for objects of the declared size. This is the right design for a single hot type you create and destroy constantly, like network packets or scene nodes in a game.

Many production systems use both: an arena for heterogeneous short-lived batches, and typed object pools for specific high-frequency objects.

Pointer Arithmetic, Nothing More

Here's a bump-pointer arena in Python:

class Arena:
    def __init__(self, capacity: int):
        self.buffer = bytearray(capacity)
        self.offset = 0

    def allocate(self, size: int) -> memoryview:
        if self.offset + size > len(self.buffer):
            raise MemoryError("arena exhausted")
        chunk = memoryview(self.buffer)[self.offset : self.offset + size]
        self.offset += size
        return chunk

    def reset(self):
        self.offset = 0

Allocation is two operations: a bounds check and a pointer bump. Resetting the arena is a single assignment. There is no per-object free.

Here's a fixed-size object pool in C++. The slot union is the key insight: a free slot stores a pointer to the next free slot inside the same memory that will hold an object when the slot is occupied. Zero overhead per slot.

template <typename T, std::size_t Capacity>
class ObjectPool {
    union Slot {
        T object;
        Slot* next;
    };

    alignas(T) Slot storage[Capacity];
    Slot* free_head;

public:
    ObjectPool() {
        for (std::size_t i = 0; i < Capacity - 1; ++i)
            storage[i].next = &storage[i + 1];
        storage[Capacity - 1].next = nullptr;
        free_head = &storage[0];
    }

    T* allocate() {
        if (!free_head) return nullptr;
        Slot* slot = free_head;
        free_head = free_head->next;
        return new (&slot->object) T{};
    }

    void deallocate(T* ptr) {
        ptr->~T();
        Slot* slot = reinterpret_cast<Slot*>(ptr);
        slot->next = free_head;
        free_head = slot;
    }
};

O(1) Means O(1)

Bump-pointer allocation is O(1) by construction: a comparison and an integer addition. Object pool allocation is also O(1): pop the free list head. Object pool deallocation is O(1): push onto the free list head.

Compare that to a general-purpose allocator like ptmalloc (glibc's default). It maintains multiple free lists binned by size, merges adjacent free blocks to fight fragmentation, and may call brk or mmap when the current heap is exhausted. This is O(log n) in the common case, with high constant overhead per call. You're paying that cost every time you allocate a packet object that lives for 8 milliseconds.

malloc vs arena: scattered fragmented allocations on the left, tight contiguous slots on the right

The space story:

Metric	General allocator	Memory pool
Per-allocation overhead	8 to 16 bytes (size header + boundary tags)	0 bytes
External fragmentation	Yes (holes between live objects)	No
Internal fragmentation	Depends on size class	Yes (unused space in fixed slots)
Total footprint	Grows on demand	Fixed at creation

You trade flexibility for predictability. A pool can exhaust. A general allocator won't (until the OS says no). For most high-throughput systems, guaranteed O(1) with a known upper bound is worth more than unbounded O(log n).

There's a cache benefit too. Objects in a pool sit contiguously in memory. Sequential access gets cache-line prefetching for free because hardware prefetchers detect the sequential stride and pre-load upcoming lines before your code requests them. General-purpose allocations scatter across the heap, so each pointer dereference is a potential cache miss. The post on spatial and temporal locality has the concrete numbers.

Where You'll See This in Production

nginx creates a memory pool per HTTP request using its ngx_pool_t type. Every URL parser, header buffer, and upstream connection object for that request allocates from the pool. When the response is sent, the entire pool is destroyed in one call. No per-object tracking. No leak from a failed mid-request path. The nginx development guide documents the pool API directly.

Python's pymalloc uses a three-tier hierarchy: arenas (256 KB each) contain pools (4 KB each), and pools are divided into fixed-size blocks (8 to 512 bytes, in multiples of eight). Each pool holds blocks of exactly one size class. When you allocate a small Python object, pymalloc finds the right pool, pops a block, and never touches the OS allocator. Large objects (over 512 bytes) bypass pymalloc entirely and go straight to malloc. The full design is in the Python memory management documentation.

Game engines use per-frame arenas as a standard pattern. A scene allocates thousands of particle objects, collision query results, and animation scratch buffers from a frame arena at the start of a frame. After rendering, the arena resets. No individual frees. No GC pause. The frame budget is predictable. This is also why game devs look at you funny when you mention a garbage collector.

Protocol Buffers (protobuf) exposes arena allocation as an explicit opt-in. Instead of allocating and freeing individual message objects, you allocate them all from a shared arena. Google added this specifically to reduce allocator pressure in high-throughput gRPC services where millions of small messages pass through per second.

The Cleanup Advantage

One of the most insidious memory leaks in long-running services comes from partially constructed objects. You allocate A, then B, then C. C fails. Now you have to remember to free A and B before returning the error, on every code path. Miss one branch and you leak. You probably have leaked. We all have leaked.

With an arena, failure cleanup is reset the offset. Everything allocated so far is reclaimed atomically. This is exactly why nginx's per-request pool design is so clean. If request parsing fails halfway through constructing headers, you destroy the pool and walk away. No partially freed state. No leak from the interrupted path. No 3am incident because someone forgot a free on the error branch added three months ago.

When Not to Use a Memory Pool

Pools are not a universal replacement for malloc. They work well when:

Objects share a lifetime (freed all at once, or in predictable batches)
You can bound the total memory needed upfront
Allocation frequency is high enough that per-call overhead matters

They're the wrong tool when objects have wildly different lifetimes and you need fine-grained individual deallocation in an unpredictable order. A service where some objects live 10 milliseconds and others live 10 minutes will waste pool memory holding logically-free slots that can't be returned to the OS until the entire pool is destroyed. A general-purpose allocator handles that case better. The pool pays for your flexibility upfront and then holds onto that memory whether you need it or not.

Watch out for one specific bug: if you use a bump allocator and hold raw pointers into it, then call reset(), those pointers are now dangling. The memory is logically reclaimed but the pointer still points into it. The same bug exists with free() and raw pointers generally, but arenas make it easier to trigger accidentally because reset() looks harmless. It's the "harmless refactor" of memory management.

Why This Shows Up in Interviews

Memory pools surface in system design interviews whenever the question involves latency, throughput, or memory efficiency at scale. Asked to design a high-throughput API gateway? Per-request arenas for parsing buffers. Asked to design a game backend that handles thousands of simultaneous connections? Per-connection object pools for packet objects. Asked why Python's GC behaves differently from Java's? pymalloc's three-tier pool design is the starting point.

LeetCode 2502 ("Design Memory Allocator") asks you to implement a simplified pool allocator with allocate and free operations on a fixed-size memory block. It's a medium, but interviewers at companies with large-scale infrastructure sometimes ask it as a warmup before a system design conversation.

If you want to practice explaining allocator tradeoffs out loud under time pressure, the way you'd actually have to in a system design round, SpaceComplexity runs voice-based mock interviews with rubric-based feedback where you can work through exactly these kinds of design decisions.