Separate Chaining: The Collision Strategy Behind Every Hash Map

Two keys hash to the same bucket. Your hash table has a problem. What it does next determines whether you understand hash maps or just use them.

Separate chaining is the answer most production hash maps give. Java's HashMap uses it. Go's map uses it. Once you understand it, you understand why hash maps are O(1) on average and O(n) in the worst case, and why those two facts coexist without contradiction. Which, by the way, is exactly the trap your interviewer is setting when they ask "so is HashMap lookup always O(1)?"

The Problem: Two Keys, One Slot

A hash table maps keys to array indices using a hash function. You hash "alice", get index 3, store the value there. Fast. Predictable.

But hashing is many-to-one. There are more possible keys than there are array slots. Two different keys can produce the same index. "alice" and "bob" might both hash to 3. That's a collision, and every hash table needs a collision resolution strategy.

There are two major approaches: separate chaining and open addressing (where you probe through the array for an empty slot). This post is about chaining. Open addressing is its own rabbit hole.

The Fix: Each Bucket Holds a List

In separate chaining, each bucket doesn't hold a single key-value pair. It holds the head of a linked list. When two keys hash to the same bucket, you stick the second one in a new list node and chain it off the first. Third collision? Another node. Keep going.

The array slots don't hold values. They hold pointers to lists.

Lookup: hash the key, go to that bucket, walk the chain until you find the key or hit the end. Insert: hash, go to bucket, prepend a new node. Delete: same walk, then remove the node and relink.

A node in the chain holds the key, the value, and a pointer to the next node:

class Node:
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.next = None

In Java's HashMap source, the entry looks like this:

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;  // cached so rehashing doesn't recompute it
    final K key;
    V value;
    Node<K,V> next;
}

The hash field is a small optimization: during rehashing, Java reuses the precomputed hash instead of calling hashCode() again on every single key.

Walk Through a Collision

Say you have 5 buckets (indices 0 through 4) and a hash function that sums ASCII values and takes mod 5.

Insert four words: "cat", "act", "dog", "god".

"cat": C(67) + A(65) + T(84) = 216. 216 % 5 = 1. Bucket 1: [cat]
"act": A(65) + C(67) + T(84) = 216. 216 % 5 = 1. Collision. Bucket 1: [act → cat]
"dog": D(68) + O(79) + G(71) = 218. 218 % 5 = 3. Bucket 3: [dog]
"god": G(71) + O(79) + D(68) = 218. 218 % 5 = 3. Collision. Bucket 3: [god → dog]

The table looks like this:

Index 0: null
Index 1: act → cat → null
Index 2: null
Index 3: god → dog → null
Index 4: null

Separate chaining diagram: 5 buckets with linked list chains showing cat/act colliding at index 1 and god/dog colliding at index 3

Yes, "cat" and "act" hash identically. So do "dog" and "god". This hash function is terrible for words because anagrams always collide (they have the same letters, so they have the same ASCII sum). A real hash function (polynomial rolling hash, MurmurHash, SipHash) spreads keys uniformly regardless of the key structure. The point of the example is to show the chaining mechanics, not to sell you on ASCII summation.

Load Factor: The Number That Runs Everything

The load factor α = n / m, where n is the number of stored keys and m is the number of buckets.

Under the Simple Uniform Hashing Assumption, each key independently hashes to any bucket with equal probability, so the expected chain length at any bucket is exactly α. 100 keys in 100 buckets, α = 1.0, chains average one node. 1000 keys in 100 buckets, α = 10, chains average ten nodes. At α = 10, every "O(1) lookup" is actually a ten-node list walk. That's not great.

Expected cost of an unsuccessful lookup: Θ(1 + α)
Expected cost of a successful lookup: Θ(1 + α/2)

Both are O(1) as long as α is bounded by a constant. This is why hash tables rehash when the load factor gets too high. Java's HashMap rehashes at α = 0.75. Python's dict rehashes at α ≈ 0.67. In both cases, crossing the threshold doubles the bucket array and redistributes every key.

That single rehash is O(n). Spread across all the insertions that led up to it, each insertion costs O(1) amortized. The "amortized" qualifier is load-bearing. Without it, the claim is false.

Separate Chaining Implementation

A stripped-down Python implementation shows the full structure:

class HashMap:
    def __init__(self, capacity=16, load_factor=0.75):
        self.capacity = capacity
        self.load_factor = load_factor
        self.size = 0
        self.buckets = [None] * self.capacity

    def _hash(self, key):
        return hash(key) % self.capacity

    def put(self, key, value):
        i = self._hash(key)
        node = self.buckets[i]
        while node:
            if node.key == key:
                node.value = value  # update existing key
                return
            node = node.next
        new_node = Node(key, value)
        new_node.next = self.buckets[i]  # prepend to chain
        self.buckets[i] = new_node
        self.size += 1
        if self.size / self.capacity > self.load_factor:
            self._rehash()

    def get(self, key):
        i = self._hash(key)
        node = self.buckets[i]
        while node:
            if node.key == key:
                return node.value
            node = node.next
        return None

    def _rehash(self):
        old_buckets = self.buckets
        self.capacity *= 2
        self.buckets = [None] * self.capacity
        self.size = 0
        for head in old_buckets:
            node = head
            while node:
                self.put(node.key, node.value)
                node = node.next

put walks the chain first to handle updates, then prepends the new node if the key is absent. Prepending is O(1); appending would require walking the entire chain. You lose insertion order but gain constant-time inserts. The insert-order trade-off is the kind of thing that sounds academic until you have a production bug where you expected ordered keys and got chaos.

When the Worst Case Is Not Theoretical

Here is the part where separate chaining stops being a CS textbook exercise and becomes a production horror story.

If your hash function maps every key to the same bucket, all n keys end up in one chain. Every lookup is a linear scan. O(n). The thing you bought a hash map to avoid.

In 2011, researchers demonstrated this against multiple production language runtimes. By crafting 65,536 keys that all hashed to the same bucket in PHP 5, they made a single HTTP request consume 30 seconds of CPU time on the server. One request. Thirty seconds. An entire web server sidelined because someone understood hash tables better than the runtime authors did.

The O(1) guarantee depends on uniform key distribution. Uniform distribution depends on a hash function with a secret seed the attacker cannot predict.

That was the fix. Not a better hash function in the mathematical sense, just a randomized seed so an attacker can't precompute colliding keys for your specific process. Python randomizes its hash seed on every process startup (PYTHONHASHSEED). Java applies extra bit mixing in HashMap.hash() to defend against poor hashCode() implementations. Go uses a per-map random seed chosen at map creation. All of them learned this lesson the same way: someone in security told them they had a problem, or someone on the internet demonstrated it publicly.

Java 8: Lists Become Trees

Java added a second line of defense in Java 8. When a single chain grows past 8 nodes (TREEIFY_THRESHOLD = 8), the linked list at that bucket converts to a red-black tree. Lookups within that bucket drop from O(n) to O(log n).

When deletions shrink the tree back below 6 nodes (UNTREEIFY_THRESHOLD = 6), it converts back to a linked list. The hysteresis between 6 and 8 prevents thrashing at the boundary.

Linked lists are faster for short chains: lower per-node overhead, better cache behavior, simpler traversal. Trees are safer for long chains: logarithmic worst case instead of linear. The hybrid uses each structure where it makes sense, which is what a lot of "which data structure should I use" interview questions are actually about.

This is also a fun piece of trivia for the "is HashMap O(1)?" follow-up. The real answer is: expected O(1) for inserts and lookups assuming reasonable load factor and hash distribution, O(log n) within a bucket in the Java 8 treeified worst case, and O(n) for the full scan if you somehow end up with a massive treeified chain and you're counting nodes visited. The interviewer probably just wants you to know about the load factor. But you can mention the red-black tree thing if you want to watch them blink.

Separate Chaining vs Open Addressing

Open addressing keeps all entries inside the main array. On collision, you probe for the next empty slot: one index forward (linear probing), a quadratic offset (quadratic probing), or a second hash function (double hashing).

Property	Separate Chaining	Open Addressing
Where entries live	External linked list per bucket	Main array
Extra memory	Pointer per node (~8 bytes)	None
Cache performance	Worse (pointer chasing)	Better (sequential access)
Load factor ceiling	Can exceed 1.0	Must stay below 1.0
Space (n keys, m buckets)	O(n + m)	O(m)
Deletion complexity	Simple (relink nodes)	Tricky (tombstones required)

Java's HashMap uses separate chaining. Python's dict uses open addressing. Go's map uses chaining with an unusual bucket layout that groups 8 key-value pairs per bucket before chaining. Open addressing wins when keys are small, reads dominate, and you care about memory. Chaining wins when inserts and deletes are frequent and you want deletion to stay simple.

How This Shows Up in Interviews

Most problems don't ask you to implement a hash table. But separate chaining matters in a few specific situations.

"Design a HashMap" (LeetCode 706) and "Design a HashSet" (LeetCode 705) ask you to build one from scratch without built-in hash map structures. Separate chaining is the natural choice: pick a prime number of buckets, chain collisions, handle rehashing if you want full credit. The prime-bucket-count detail earns you a nod from interviewers who have seen too many people pick 10.

The LRU Cache implementation uses a HashMap internally for O(1) lookup. When an interviewer asks why that lookup is O(1), you need to explain the chain structure and load factor, not just say "it's a hash map."

More often, the follow-up catches people off guard: you say your solution uses a HashMap so all operations are O(1), and the interviewer asks "is that always true?" The honest answer is expected O(1) under uniform hashing, worst-case O(n) in the degenerate case. The load factor and hash function quality are what keep you in the average case. Saying this out loud, clearly, is what separates a "strong hire" write-up from a "decent with hash maps" write-up.

Explaining these internals under pressure is a different skill from knowing them on paper. SpaceComplexity runs voice-based mock interviews with rubric-based feedback on exactly this kind of technical depth.

For the math behind why O(1) holds up, the hash map time complexity post walks through the SUHA proof and expected chain length derivation. For the hash function side, how hash functions work covers polynomial rolling hash and why seed randomization matters.