Linear Probing vs Separate Chaining: When Cache Beats Simplicity

- Separate chaining stores collisions in per-bucket linked lists; deletions are trivial and load factor can safely exceed 1.0
- Linear probing keeps all keys in a flat array, giving cache-line locality and 20–50% better read throughput at moderate load
- Primary clustering makes linear probing degrade catastrophically above α ≈ 0.7; separate chaining stays near O(1) up to α = 2+
- Tombstones are required for correct linear probing deletions; accumulated tombstones raise the effective load factor over time
- Every major standard library (Java HashMap, Python dict, Go map, C++ unordered_map) ships separate chaining or a hybrid for predictability across all usage patterns
- Pick linear probing for memory-tight, read-heavy workloads with a controlled load factor; pick separate chaining for general use or frequent deletions
Two hash tables walk into an interview. Both claim O(1) lookup. Both are right. Neither mentions that one of them quietly falls apart as the table fills up, at which point it starts doing something closer to O(please-god-stop-probing).
The collision strategy is where they diverge. Separate chaining appends to a list inside each bucket. Linear probing scans forward for the next open slot. Same average-case complexity on paper, completely different behavior when your load factor gets reckless. The question is not which is theoretically better. It is which fits your actual access pattern and how much you trust yourself to keep α below 0.7.
Every Bucket Gets Its Own List
Each bucket holds a pointer to a linked list. When two keys hash to the same index, both live in that bucket as linked list nodes. It is the hash table equivalent of roommates: a little crowded, but everyone has a spot.
class SeparateChaining: def __init__(self, capacity=16): self.capacity = capacity self.buckets = [[] for _ in range(capacity)] def _index(self, key): return hash(key) % self.capacity def put(self, key, value): i = self._index(key) for j, (k, v) in enumerate(self.buckets[i]): if k == key: self.buckets[i][j] = (key, value) return self.buckets[i].append((key, value)) def get(self, key): i = self._index(key) for k, v in self.buckets[i]: if k == key: return v return None def delete(self, key): i = self._index(key) self.buckets[i] = [(k, v) for k, v in self.buckets[i] if k != key]
The table can hold more entries than it has buckets. Load factor α = n/m (entries over buckets) can exceed 1 without breaking anything. The chains just grow longer, and the expected lookup cost grows proportionally but politely. Deletion is trivial: find the node, remove it, and nothing else in the structure needs to change. No haunted slots. No undead sentinels. Just a shorter list.
One Flat Array, No Pointers
All keys live directly in a flat array. No linked lists, no heap allocations per entry. When a collision occurs, the algorithm steps forward one slot at a time, wrapping around, until it finds an empty slot. It is aggressively literal.
class LinearProbing: _DELETED = object() # tombstone sentinel def __init__(self, capacity=16): self.capacity = capacity self.table = [None] * capacity self.size = 0 def _start(self, key): return hash(key) % self.capacity def put(self, key, value): i = self._start(key) while self.table[i] is not None and self.table[i] is not self._DELETED: if self.table[i][0] == key: self.table[i] = (key, value) return i = (i + 1) % self.capacity self.table[i] = (key, value) self.size += 1 def get(self, key): i = self._start(key) while self.table[i] is not None: if self.table[i] is not self._DELETED and self.table[i][0] == key: return self.table[i][1] i = (i + 1) % self.capacity return None def delete(self, key): i = self._start(key) while self.table[i] is not None: if self.table[i] is not self._DELETED and self.table[i][0] == key: self.table[i] = self._DELETED self.size -= 1 return i = (i + 1) % self.capacity
You cannot simply null out a deleted slot. If a lookup once probed through that slot to find a key further along, nulling it breaks the probe sequence and gives you a false miss. The lookup stops, sees an empty slot, and concludes the key doesn't exist. It does exist. You just told the algorithm to give up too early.
Tombstones tell the probe sequence "there used to be something here, keep walking." They work. They are also slightly cursed.
Here is a concrete example. Capacity 8, keys hash as: "apple"→2, "mango"→2, "grape"→3.
Insert "apple" → slot 2: [_, _, apple, _, _, _, _, _]
Insert "mango" → slot 2 taken, probe to slot 3: [_, _, apple, mango, _, _, _, _]
Insert "grape" → slot 3 taken, probe to slot 4: [_, _, apple, mango, grape, _, _, _]
"grape" is now two slots from home. Every new key hashing to slot 2, 3, or 4 extends this run. That is primary clustering: occupied slots attract more occupied slots, like a highway accident that causes three more accidents.
Delete "mango":
[_, _, apple, DELETED, grape, _, _, _]
Lookup for "grape" starts at slot 3, sees DELETED, continues to slot 4, finds it. Write None instead of DELETED and the lookup returns None. Wrong answer. Congrats on the bug.
Cache Is Linear Probing's Superpower
Linear probing wins here, and it is not close.
Every key lives in a contiguous array. When you access index i, your CPU loads a 64-byte cache line and pulls in roughly 15 neighboring slots simultaneously. Probing forward costs almost nothing because those slots are already sitting in L1 cache, warm and ready.
Separate chaining follows a pointer to a heap-allocated node that could be anywhere in memory. Following three nodes in a chain means three potential cache misses, each costing 80-100 ns against DRAM. At that price, "O(1) expected" starts to feel like a very optimistic estimate.
Benchmarks on read-heavy workloads consistently show linear probing 20-50% faster than separate chaining at moderate load factors. The asymptotic complexity is the same. The constant factor is not.
Primary Clustering Is Linear Probing's Achilles Heel
The expected probe count for a successful lookup under linear probing:
probes_successful = (1/2) * (1 + 1/(1 - α))
Watch what happens as you let α creep up:
At α = 0.5: 1.5 probes. Completely fine.
At α = 0.7: 2.2 probes. Still acceptable.
At α = 0.9: 5.5 probes. Your cache advantage is now gone.
At α = 0.95: 10.5 probes. You are doing linear search with extra steps.
For unsuccessful lookups the formula is (1/2) * (1 + 1/(1-α)²). At α = 0.9 that becomes 50 probes. Not a typo. Fifty probes to confirm a key is absent.
Separate chaining degrades much more gracefully. Expected successful lookup is 1 + α/2. At α = 0.9 that is 1.45 probes. At α = 2.0 it is still only 2. The chains get longer, but each lookup only walks its own chain.
Linear probing must stay below α ≈ 0.7. Separate chaining tolerates α well above 1.0. This is not a minor caveat. It is the entire story of why production code defaults to chaining.
This is why practical hash map load factors differ between implementations. Python's dict (open addressing) resizes at α ≈ 0.67. Java's HashMap (separate chaining) resizes at α = 0.75 and can tolerate higher before performance noticeably degrades.
Deletion Is Where Linear Probing Gets Awkward
Separate chaining: unlink the node. Done. Go home. Nothing else changes.
Linear probing: either use tombstones (which accumulate over time, haunting your table and slowing all future probes) or implement backward-shift deletion. Backward-shift scans forward after each deletion and moves displaced keys back toward their home slots. It is correct. It is also the kind of code that makes you wonder how much of your life you want to spend on this.
A table that has grown and shrunk several times will have DELETED slots interspersed with live entries, effectively raising the functional load factor for probe purposes. If your workload deletes frequently, tombstone accumulation is a real performance problem, not a hypothetical one. The table is full of ghost entries that consume probe steps without holding any data.
Memory Favors Linear Probing
One flat array, no pointers, no allocator overhead. Each node in separate chaining costs 8 bytes for the next pointer on a 64-bit system, plus 16-32 bytes for heap metadata per allocation. For small values like integers, chaining can use two to three times as much memory as probing for the same number of entries.
If you are building something memory-constrained, this is real money.
Linear Probing vs Separate Chaining: The Numbers
| Separate Chaining | Linear Probing (α < 0.7) | |
|---|---|---|
| Insert (avg) | O(1) | O(1) |
| Lookup success (avg) | O(1 + α/2) | O((1 + 1/(1-α)) / 2) |
| Lookup failure (avg) | O(1 + α) | O((1 + 1/(1-α)²) / 2) |
| Delete | O(1) | O(1) + tombstone cost |
| Memory | O(n + m) | O(m) |
| Load factor tolerance | α > 1.0 fine | α > 0.7 painful |
| Cache behavior | Pointer chasing | Sequential access |
Both share the same worst case: O(n), when all keys hash to one bucket. That is a hash function problem, not a collision strategy problem. A good hash function with randomized seeds prevents adversarial inputs from triggering it. Hash collisions are the threat; the collision strategy just determines how bad the damage is when they happen.
What Java, Python, and Go Actually Ship
Java's HashMap uses separate chaining. Since Java 8, chains that grow past 8 entries are promoted to red-black trees, giving O(log n) worst case instead of O(n). Default load factor: 0.75. Java's answer to the clustering problem was to just add another data structure inside the bucket.
Python's dict uses open addressing with pseudorandom probing. The probe sequence is i = (5*i + 1 + perturb) % capacity, where perturb shifts right each step. This breaks up primary clustering that pure linear probing produces. Resizes at α ≈ 0.67. Python knows about the clustering problem and took the time to randomize its way out of it.
Go's map uses separate chaining with a bucket design that packs 8 key-value pairs into each bucket, with keys stored separately from values for better cache efficiency. This hybrid gets you some spatial locality without the full fragmentation of naive chaining.
C++'s std::unordered_map uses separate chaining. Default max load factor: 1.0. At α = 1.0 you are flirting with trouble, but the standard committee decided that was someone else's problem.
Every major language standard library ships with separate chaining as the default. The cache advantage of linear probing is real and measurable, but separate chaining's predictability and simpler deletion semantics win at the library level, where correctness across every possible usage pattern matters more than peak throughput at a carefully tuned load factor.
Which Should You Pick?
Use separate chaining when:
- Deletions are frequent (tombstone management is a headache you do not need)
- Load factor might vary or exceed 1.0
- You are matching what production libraries already do
- Implementation correctness under all conditions is the priority
Use linear probing when:
- Memory is tight and pointer overhead per entry is unacceptable
- Your workload is read-heavy with a stable, moderate load factor
- You need cache-optimal performance for small, fixed-size keys
- You have precise control over load factor and resize thresholds and you will actually maintain that control
For hash map time complexity that holds up under real load: separate chaining is easier to get right, and linear probing is faster when you keep α below 0.7. Most production code reaches for chaining and calls it a day. Performance-critical systems (in-memory databases, low-latency caches) often reach for linear probing with careful tuning and a person responsible for keeping that load factor honest.
If you are prepping for a technical interview and get asked to implement a hash map from scratch, know both. Interviewers specifically probe whether you understand clustering, the tombstone deletion problem, and why your resize threshold differs between the two strategies. Running a mock interview on SpaceComplexity will surface exactly these follow-up questions in a live, voice-based session, so you can practice articulating the tradeoffs under pressure rather than discovering the gaps on interview day.