Geo Proximity Service System Design Interview: Geohash Is the Answer

You know the requirements. You have the API. Then the interviewer asks: "How do you actually find businesses within 5 km of the user?" Your first instinct is WHERE latitude BETWEEN X AND Y AND longitude BETWEEN A AND B.

That instinct is wrong. And it's wrong in exactly the way that reveals whether you've thought seriously about spatial data.

This covers the full Yelp-style geo proximity service system design interview: requirements, architecture, geohash spatial indexing, and the scaling discussion.

Set the Scope First

Five minutes on requirements pays off later.

Functional:

Users search for businesses by location, radius, and category
Business owners create, update, and close listings
Users can read business details: name, address, rating, photos

Non-functional:

Search returns results in under 100 ms at the 99th percentile
Read-heavy: searches are roughly 100x more frequent than writes
High availability; a few seconds of stale data in search results is acceptable

Scale estimates: Assume 100M DAU, each running 5 searches per day. That's 500M searches per day, roughly 6,000 QPS at baseline. With a 3x peak multiplier, plan for 18,000 QPS. Write QPS is much lower: 5M businesses updated infrequently.

Two Services, One Key Decision

Write and read traffic have completely different patterns. A business listing update competing for resources with 18,000 search queries per second is not a design you want. So you split.

Business Service handles write traffic. Business owners call this to create and update listings. It writes to a business database and triggers an async job to update the geospatial index. Latency here can be 200-300 ms and nobody cares.

Location-Based Service (LBS) handles read traffic. Every search goes through the LBS. It must be stateless, horizontally scalable, and backed by fast caches. Target p99 under 100 ms. This is the service your interviewer actually wants to talk about.

Architecture diagram showing LBS and Business Service behind an API gateway, with Redis cluster and geospatial index DB on the read path, Business DB and async worker on the write path The two services share nothing except a database. That is the point.

Why Range Scans on Lat/Lng Fail

The naive approach:

SELECT * FROM businesses
WHERE lat BETWEEN 37.72 AND 37.83
  AND lng BETWEEN -122.50 AND -122.35;

MySQL cannot efficiently combine two separate B-tree indexes for simultaneous range conditions on different columns. In practice, the query planner picks one column's index, reduces 200M rows to maybe 10M, then does a sequential scan on the second column. That's still millions of comparisons per query.

You will get this wrong once in production, spend three hours looking at slow query logs, and then you will never forget it.

The root problem is that latitude and longitude are two independent 1D values, but location is a 2D concept. You need an index that reasons about 2D proximity. The interview-standard answer is geohash.

Geohash: Folding 2D Into 1D

Geohash encodes any (lat, lng) pair as a short alphanumeric string such that strings sharing a prefix are geographically close.

The encoding works by recursive bisection:

Take the longitude range [-180, +180]. If the point is in the upper half, record bit 1; lower half, bit 0. Halve the range and repeat.
Alternate: the next bit comes from latitude, the one after from longitude, and so on.
Every 5 bits of the interleaved sequence become one character in a base-32 alphabet (0-9 and bcdefghjkmnpqrstuvwxyz, with a, i, l, o omitted to avoid ambiguity).

Geohash encoding diagram showing recursive bisection: longitude split first, then latitude, building up bit stream 01011... that maps to base-32 characters like 9q8yy for San Francisco Keep halving the map until you get a cell the size you need. Shared prefix = geographically close.

The string length controls precision:

Length	Cell size
4	~39 km x 20 km
5	~4.9 km x 4.9 km
6	~1.2 km x 0.6 km
7	~153 m x 153 m

The rule: choose the precision level where the cell size is smaller than your search radius. For a "within 5 km" query, precision 5 (4.9 km cells) is the right choice. For "within 500 m," use precision 7.

Store a geohash column in the businesses table at precision 6 and index it:

ALTER TABLE businesses ADD COLUMN geohash VARCHAR(8);
CREATE INDEX idx_geohash ON businesses (geohash);

SELECT * FROM businesses WHERE geohash = '9q8yy9';

That's an equality lookup on an indexed string column. Fast at any scale.

The 9-Cell Query: Where Most Designs Break

Here's the part most candidates miss.

A user standing at the edge of cell 9q8yy might be closest to a business in the adjacent cell 9q8yz. Query only the user's own cell and you miss it entirely. Your app tells them "no restaurants nearby" while they're standing 40 meters from a taqueria. This is bad.

Always query the center cell plus all 8 neighbors, forming a 3x3 grid of 9 cells.

3x3 grid of geohash cells showing a user positioned at the edge of the center cell, with a business just across the boundary in an adjacent cell, illustrating why single-cell queries miss nearby results User at cell edge. Business 40 meters away in the next cell. Query just the center? Missed. Always use 9 cells.

import geohash  # python-geohash library

user_hash = geohash.encode(lat, lng, precision=5)
neighbors = list(geohash.neighbors(user_hash).values())  # 8 neighbors
cells = [user_hash] + neighbors  # 9 total

# SQL
cursor.execute(
    "SELECT business_id, lat, lng FROM businesses WHERE geohash IN %s",
    (tuple(cells),)
)

But geohash cells are rectangles, not circles. The 9-cell grid includes the corners of the bounding rectangle, which extend past your intended radius. You need one more step.

Haversine: The Exact Filter

After the geohash query returns maybe a few hundred candidates, apply the Haversine formula to compute exact great-circle distance and discard anything beyond the radius:

from math import radians, sin, cos, sqrt, atan2

def haversine_km(lat1, lng1, lat2, lng2):
    R = 6371  # Earth radius in km
    dlat = radians(lat2 - lat1)
    dlng = radians(lng2 - lng1)
    a = (sin(dlat / 2) ** 2
         + cos(radians(lat1)) * cos(radians(lat2)) * sin(dlng / 2) ** 2)
    return 2 * R * atan2(sqrt(a), sqrt(1 - a))

results = [
    b for b in candidates
    if haversine_km(user_lat, user_lng, b.lat, b.lng) <= radius_km
]

Haversine filter diagram: 3x3 rectangle of geohash cells with a circular search radius overlaid. Businesses inside the circle are green. Businesses in the grid corners but outside the circle are gray with X marks, showing what gets filtered out. The rectangle gives you speed. The circle gives you accuracy. You need both.

Geohash cuts 200 million rows to a few hundred candidates. Haversine cuts those to the exact right set. Both steps are necessary: geohash alone returns wrong results; Haversine alone against the full table is too slow.

The Full Search Path

A user searches for restaurants within 5 km. End to end:

LBS receives GET /search?lat=37.77&lng=-122.42&radius=5000&category=restaurant
Map 5 km radius to precision 5. Compute user's geohash: 9q8yy.
Compute 8 neighbors. Build the 9-cell list.
Pipeline 9 Redis lookups in parallel: SMEMBERS geohash:5:9q8yy, etc. Each key holds a set of business_ids.
Cache miss? Query the geospatial index table in the database for the missing cells.
Fetch business details (name, rating, address) from a second Redis cache keyed by business_id.
Apply Haversine. Sort by distance or rating. Return top 20.

Two Redis hops. Everything else is the exception. Target p99 is 100 ms; this hits around 10-12 ms on the hot path.

The two Redis hops handle the vast majority of traffic. Database reads are the exception, not the norm.

Geohash vs Quadtree

Geohash uses a fixed grid. A precision-5 cell in Manhattan and a precision-5 cell in rural Kansas cover identical geographic area, even though one contains thousands of businesses and the other contains zero. Hot cells in dense areas require more work per query and are harder to cache uniformly.

A quadtree solves this by splitting cells only when they accumulate more businesses than a threshold (say, 100 per leaf node). Manhattan gets subdivided 10+ levels deep. A rural cell stays near the root. Every leaf ends up with roughly the same number of businesses, so every query does roughly the same amount of work.

The interview tradeoff: geohash is simpler to implement, cache, and explain; quadtree handles uneven density better. In most interviews, geohash is the right answer. Mention quadtree as the density-adaptive alternative that systems like Uber use for real-time driver locations, where rides cluster heavily by time and neighborhood.

Geohash wins interviews. Quadtree wins cities. Know which one you're in.

Caching Is Unusually Effective Here

Location queries have a property most systems don't: they cluster geographically. Thousands of users in downtown San Francisco send queries that resolve to the same 9 geohash cells. Cache the geohash -> [business_id_list] mapping in Redis with a 10-minute TTL and you serve most searches entirely from memory.

This is the part interviewers want to see you get excited about. It's a genuinely elegant property.

Cache layout:

Key: geohash:{precision}:{cell} (e.g., geohash:5:9q8yy)
Value: Redis Set of business_ids
TTL: 10 minutes for hot cells, 1 hour for cold cells

Business details go into a second Redis cache keyed by business_id. Two Redis lookups cover both hops in the search path for nearly all traffic.

For the database tier, read replicas handle the read load while the primary handles only writes. The LBS routes all reads to replicas. For a deeper look at cache invalidation strategies beyond TTL, the caching guide covers each pattern.

The Write Path: Async, Eventually Consistent

A business owner updates their address. The flow:

Business Service validates and writes to the business DB primary.
An async worker recomputes the geohash for the updated coordinates.
Worker deletes the old (geohash, business_id) row from the geospatial index and inserts the new one.
Cache entries for the affected cells get invalidated (delete-on-write; next read repopulates).

Eventual consistency is explicitly correct here. A restaurant updating its hours at 2 PM appearing in search results at 2:05 PM is fine. This eliminates distributed transactions from the design entirely. If your interviewer pushes on consistency, the right move is to ask: "Is there a business requirement that requires synchronous update visibility?" There usually isn't.

The 45-Minute Clock

Phase	Time	Focus
Requirements + estimates	5 min	Two services, 18K QPS, read-heavy
API + data model	5 min	GET /search params, business table, geospatial_index table
High-level design	10 min	LBS, Business Service, Redis cluster, DB
Spatial index deep dive	10 min	Geohash, 9-cell query, Haversine, quadtree tradeoff
Scaling + caching	8 min	Cell cache strategy, read replicas, write path
Wrap-up	2 min	Consistency model, extensions (ratings, photos)

The most common failure mode: spending 15 minutes on geohash bit interleaving and never reaching caching or scaling. The interviewer cares about why range scans fail, how you handle cell boundaries, and how caching works at scale. Get through the encoding mechanics in 3 minutes. Your interviewer does not need you to manually demonstrate base-32 conversion; they need to know you understand why the string prefix property enables fast lookups.

Most candidates design only the search path and get caught when the write questions come. Know the async worker pattern and be able to justify eventual consistency in one sentence.

If you want to practice this under real time pressure, SpaceComplexity runs voice-based system design interviews with rubric feedback on exactly this kind of problem.

Recap

Two-column lat/lng range scans don't scale. You need a spatial index.
Geohash encodes 2D location as a short string where shared prefix means geographic proximity.
Choose precision by search radius: 5 km maps to precision 5, roughly 4.9 km cells.
Always query 9 cells (center plus 8 neighbors) to avoid boundary misses.
Apply Haversine after the geohash pre-filter to get the exact result set.
Cache the geohash -> [business_ids] mapping in Redis. Location queries cluster; cache hit rate is very high.
Separate Business Service (writes) from LBS (reads). Keep LBS stateless.
Eventual consistency is correct for business data. No distributed transactions needed.
Quadtree is the density-adaptive alternative for uneven geographic distributions.