Bloom Filters in Distributed Systems: The System Design Interview Guide

- Bloom filters give certain "no" answers and probabilistic "yes" answers, with zero false negatives guaranteed
- 10 bits per element at 1% false positive rate is the sizing number to quote on any whiteboard
- Cassandra, RocksDB, and Bigtable all keep one Bloom filter per SSTable in memory to eliminate unnecessary disk reads
- No deletes is the key constraint to name alongside every Bloom filter proposal in an interview
- Counting and cuckoo filters are the right answers when your set is mutable and deletion is required
- Ribbon filter is what to mention if an interviewer asks what improves on a standard Bloom filter
Your distributed cache serves a billion users. Every cache miss hits the database. Most of those misses are for keys that don't exist at all: deleted users, expired sessions, bad inputs. You're paying a full database round-trip to confirm that nothing is there.
That's not a slow query. That's paying rent on an empty apartment. A billion times a day.
A Bloom filter tells you, in O(1) time and a handful of megabytes, whether something is definitely not in a set. That asymmetry (certain "no," probable "yes") turns out to be exactly what distributed systems need.
How a Bloom Filter Works
A Bloom filter is a bit array of m bits, all initialized to 0, paired with k independent hash functions. Each hash function maps an input to a position in that array.
To insert an element: run it through all k hash functions, set those k bit positions to 1.
To query: run the element through the same k functions. If any bit is 0, the element is definitely not in the set. If all k bits are 1, the element is probably in the set. But you might be seeing a false positive, where other insertions happened to set the same bits.
Two guarantees worth memorizing: there are no false negatives, and there is no delete operation on a standard Bloom filter. Unsetting a bit could invalidate other elements that share it. Once a bit flips to 1, it stays there. Permanent, like a tattoo, and about as hard to explain in an interview.
import mmh3 from bitarray import bitarray class BloomFilter: def __init__(self, size: int, hash_count: int): self.size = size self.hash_count = hash_count self.bits = bitarray(size) self.bits.setall(0) def add(self, item: str) -> None: for seed in range(self.hash_count): index = mmh3.hash(item, seed) % self.size self.bits[index] = 1 def check(self, item: str) -> bool: return all( self.bits[mmh3.hash(item, seed) % self.size] for seed in range(self.hash_count) )
If you want the full implementation walkthrough, including all 14 language implementations, the data structure deep dive has it. This article is about where Bloom filters show up in production and how to talk about them in a system design interview.
Size It Before You Say It
You don't need to derive the formulas under pressure. Genuinely. No one is asking you to differentiate ln(2) squared in a system design interview while your screen share is lagging.
You need the practical numbers and the ability to size a filter on a whiteboard.
The rule of thumb: 10 bits per element at 1% false positive rate.
| Target FPR | Bits per element | Hash functions |
|---|---|---|
| 10% | 4.8 | 3 |
| 1% | 9.6 | 7 |
| 0.1% | 14.4 | 10 |
| 0.01% | 19.2 | 13 |
For 1 billion elements at 1% FPR: roughly 1.2 GB. A hash set holding the same billion string keys, at 32 to 50 bytes per key, would cost 32 to 50 GB. The space savings are around 25 to 40x, with identical O(1) lookup complexity.
This is the number to drop in interviews: "at 1% false positive rate, a Bloom filter costs about 10 bits per element regardless of element size." An interviewer who asks "how do you size it" gets a concrete answer, not a shrug and "it depends."
Where They Show Up in Production
Cassandra and the SSTable read path. Cassandra writes data to immutable sorted files on disk called SSTables. A single read might need to check several SSTables to find the right one, and each check is a disk seek.
Cassandra keeps one Bloom filter per SSTable in off-heap RAM. Before touching disk, it queries the filter. "Definitely not here" means skip entirely. The default false positive chance is 0.1% in most compaction strategies. This single filter eliminates the majority of unnecessary disk reads in a read-heavy Cassandra cluster. HBase and Google Bigtable use the same design. BigTable's original paper describes filters that prevent most disk lookups for reads of non-existent rows or columns.

RocksDB and LevelDB follow the same architecture. Every SST file gets a Bloom filter. RocksDB's format version 5 achieves below 0.1% FPR at 16 bits per key.
Akamai's CDN cache. Akamai found that 75% of assets in their edge caches were "one-hit wonders," requested exactly once and never again. Caching them was pure waste: memory used once, then evicted.
Their fix was a Bloom filter in front of the cache write path. On a first request, the filter says "definitely not seen before," so Akamai serves the content without caching it. On a second request, the filter says "probably seen before," so they cache it.
False positives here just mean caching something a touch too eagerly. The cost is basically zero. The result: cache capacity for genuinely popular content tripled without adding hardware. Sometimes being slightly wrong is completely fine, which is a sentence you can say about Bloom filters but not about most things in production.
Web crawlers and URL deduplication. At the scale of a web crawler, billions of URLs need to be tracked to avoid re-crawling. A hash set of URLs doesn't fit in memory. A Bloom filter holding the same set does. The web crawler system design covers this in detail, but the key point is that a Bloom filter gives you the deduplication guarantee at a fraction of the memory cost, with the small risk of occasionally re-crawling a URL flagged as a false positive.
The Costs You Can't Ignore
An interviewer wants to hear the downsides as clearly as the upsides.
False positives have a real cost. At 1% FPR, 1 in 100 "probably yes" answers is wrong. In Cassandra, that means 1 in 100 checked SSTables does a full disk read and finds nothing. That's acceptable. In a security context, an IP blocklist or a fraud detection gate, false positives might mean passing through traffic you meant to reject. The right FPR depends on the cost of acting on a wrong answer.
No deletes. If your set changes (users are removed, sessions expire, keys are invalidated), a standard Bloom filter can't track that. You either accept the stale false positives or rebuild the filter periodically. In Cassandra, SSTables are immutable and compacted away, so the filter is rebuilt naturally. In a mutable use case, the no-delete constraint is a real limitation. The counting Bloom filter variant solves it, at 3 to 4x the memory cost. There's always a price.
Fixed size. A Bloom filter is sized for a projected number of elements. If the dataset doubles, the FPR rises unless you rebuild at larger size. Plan the sizing conversation in advance: "We expect 500 million elements in year one. At 1% FPR that's 600 MB. At 1 billion elements the filter grows to 1.2 GB, which is still fine for an in-memory component." Interviewers love when you think past the happy path.
Still a pre-filter, not a replacement. After a "probably yes," you still do the real lookup. The filter reduces the cost of that path. It doesn't eliminate it. Your system needs to handle the case where the lookup finds nothing.
When Standard Isn't Enough
Counting Bloom filter. Replaces each bit with a small counter. Supports deletion by decrementing on remove. Uses 3 to 4x the space of a standard filter. Useful when your set is mutable and rebuild isn't practical.
Cuckoo filter. Uses cuckoo hashing to store fingerprints rather than setting bits. Supports deletion, uses less space than a Bloom filter when FPR is below 3%, and is faster to query. The tradeoff is implementation complexity.
Ribbon filter. RocksDB's preferred replacement for Bloom filters in newer SST file levels. About 30% smaller at equivalent FPR, at the cost of 3 to 4x higher CPU during construction. A solid answer if an interviewer asks "what's better than a Bloom filter."
You don't need to implement these in an interview. Knowing they exist and when to reach for them is what separates a practiced answer from a textbook one. "What's better than a Bloom filter" is a real follow-up question, and "I don't know" is not a great answer when you're going for staff.
How Do You Actually Talk About This in an Interview?
Most candidates who know Bloom filters still fumble this. They name the data structure and move on. The interviewer nods, writes something like "mentioned Bloom filter," and gives you a 2 out of 4 on problem-solving.
The signal to reach for a Bloom filter is any of these appearing in a design:
- "Check existence before an expensive disk read or cross-service call"
- "The membership set is too large to hold in a hash set in memory"
- "False positives are tolerable, but false negatives are not"
- "We want to reduce disk I/O for reads that often find nothing"
When you mention it, size it immediately. "At 1% false positive rate, we need about 10 bits per element. With 1 billion users, that's 1.2 GB of RAM to keep the filter resident in memory, well within the budget for a cache tier." That sentence tells the interviewer you understand the practical cost, not just the concept.
Then name the tradeoff. "The constraint here is no deletes. If users can be deactivated, we'd need to rebuild the filter nightly or use a counting variant with higher memory overhead."
Then name where it lives architecturally. "One filter per storage shard, loaded into memory at startup, queried before every disk read. That's the Cassandra and BigTable pattern." Grounding it in a well-known system adds credibility and moves the conversation forward.
The mistake candidates make is mentioning the data structure without explaining the decision. An interviewer doesn't want to hear "we can use a Bloom filter here." They want to hear "we can use a Bloom filter here because we're doing membership checks on a 500-million-element set where false positives send us to disk at most 1% of the time, which is acceptable given the 30x memory savings over a hash set."
One sentence. Size, tradeoff, acceptable cost. That's the whole thing.
Practice articulating that tradeoff out loud. It reads cleanly on paper but the sizing calculation and the constraint acknowledgment have to come naturally under time pressure. Reading this article is a start. Saying it in a fake interview, on camera, while someone is silently judging your architecture choices, is a different skill entirely. SpaceComplexity runs voice-based system design interviews where you can work through exactly this kind of decision under realistic conditions.
For more on the distributed systems that use Bloom filters at their storage layer, the distributed cache system design walkthrough and key-value store design both cover the read path in depth.