Google Maps System Design Interview: Five Systems, One Framework

You have 45 minutes. You just got asked to design Google Maps. Your brain goes: database of roads, Dijkstra, done. You say this out loud. The interviewer nods. They write something. You do not get to see what they wrote, but I can tell you: it is not good.

Google Maps is not one system. It is five separate systems bolted together, each of which would be a reasonable standalone interview question on its own. The candidate who walks in thinking "maps equals shortest path" is going to describe about 20% of what actually exists, and the interviewer is going to spend the next 40 minutes watching them flounder around the other 80%.

You do not need to design all five. You need to know which five exist, pick two or three to go deep on, and explain your choices out loud. That meta-skill, the ability to scope a massive system in real time, is what the interviewer is actually evaluating.

Using maps before Google Maps was created starterpack Before someone had to design this thing, we printed MapQuest directions and folded them incorrectly. Now it is five distributed systems and a DeepMind GNN. Progress.

Clarify Before You Touch the Marker

Google Maps has three core user journeys. You need to pick at most two to own.

Map browsing: pan, zoom, and see a visual map
Search: find a place by name, category, or address
Navigation: get a route with turn-by-turn directions, updated for traffic

In a 45-minute interview, you can deeply cover one or two of these. Deeply. The recommended pairing: map browsing plus navigation with traffic. That combination forces you through the two most interesting and technically distinctive parts of the system. Plus when you say "Contraction Hierarchies" instead of "Dijkstra," you pass.

Say the Numbers Out Loud Before You Draw

State scale before you draw anything. Interviewers want to see you reason from numbers, not from vibes.

1 billion active users, up to 20 million navigating simultaneously at peak
10 billion map tile requests per day
Route computation: under 500 ms at P99
Tile delivery: under 100 ms (CDN-served)
99.99% availability (navigation failures during emergencies have real consequences)
Read-heavy: tiles are read billions of times, written only when roads change

The constraint that matters most: a system that serves 10 billion tile requests per day cannot generate tiles on demand. Full stop. That single observation locks in the entire map-serving architecture before you've drawn a single box.

Name All Five Subsystems Upfront

Five Google Maps subsystems: client connects to CDN, Place Search, and Routing. Routing reads from Traffic. Traffic ingests from Location Pipeline. The five subsystems. Name all of them in the first five minutes, then tell the interviewer which two you're going deep on. This alone puts you ahead of most candidates.

Subsystem	Core design problem
Map Tile Service	Pre-generate and cache petabytes of image tiles
Routing Service	Run shortest-path on a global graph in under 500 ms
Traffic Pipeline	Aggregate billions of GPS probes into per-road-segment speeds
Place Search	Combine geospatial filtering with full-text ranking
Location Ingestion	Receive real-time device positions without becoming a surveillance system

Name all five in the first five minutes. Then announce which two you're going deep on and why. You will not have time for all five. That is fine. The interviewer knows that. What they're watching for is whether you know what you're skipping and why.

Nobody Renders a Map When You Ask For It

The map you see in your browser is not rendered on request. It is a pre-computed image generated and cached before you ever opened the app.

The world is divided into a tile pyramid using the XYZ scheme. At zoom level 0, the entire world fits in one 256x256 pixel tile projected onto a square via Web Mercator (EPSG:3857). Each additional zoom level divides every tile into four, so at zoom level z there are 4^z total tiles. Google Maps goes to zoom 22. The full pyramid contains over a trillion tiles. Not all are generated (oceans stay sparse), but the number is large enough that on-demand rendering is not a conversation we're having.

XYZ tile pyramid showing zoom levels 0 through 22 with tile counts and the deterministic URL scheme Each zoom level multiplies tile count by 4. At zoom 22 you're looking at individual buildings. The URL is deterministic, which is the whole reason CDN caching works so well.

The tile request flow:

The client computes which tiles are visible given its viewport and zoom level. Tile URLs are deterministic: /{z}/{x}/{y}.png. This determinism is the key to caching.
The request hits the CDN edge nearest to the user. Cache hit rates for tiles exceed 90% globally because cities and common zoom levels are always warm.
On a CDN miss, the request falls to the Tile Service, which reads from object storage (the equivalent of GCS or S3) where pre-generated tiles live.
When roads change, the rendering pipeline regenerates the affected tiles and evicts the corresponding CDN keys.

Map tile request path from phone through CDN (90%+ hits) to Tile Service to Object Storage, with offline tile generation pipeline feeding storage The CDN is not a performance optimization on top of the real system. The CDN IS the serving strategy.

The CDN is not an optimization bolted on top of the real system. The CDN IS the serving strategy. Static tiles are immutable objects with deterministic URLs. They are basically a CDN engineer's dream. You could not have designed a workload more perfectly suited to caching if you tried.

Raster tiles (PNG) are simple to generate and universally compatible, but vector tiles are smaller and render client-side, so the user can switch between dark mode, transit view, and satellite without fetching new tiles. Google uses vector tiles. In an interview: mention both, say you would start with raster for simplicity, migrate to vector once the pipeline is stable. Raster first is not wrong. It is just not what Google does.

Dijkstra on a Planet-Scale Graph is a Joke

The global road network has roughly 400 million nodes (intersections) and 900 million edges (road segments). Running vanilla Dijkstra's algorithm on that takes seconds. A user waiting for a route in 500 ms cannot sit through seconds. This is the gap most candidates fall into: they propose the textbook algorithm and stop there, as if scale is someone else's problem.

The solution is Contraction Hierarchies, developed at the Karlsruhe Institute of Technology. Two phases.

Preprocessing. Rank every node by "importance," a heuristic based on edge count and how many shortcuts would be needed to preserve shortest paths if the node were removed. Contract nodes from least important (a dead-end residential street) to most important (a highway interchange). When you contract node v, add shortcut edges between v's neighbors wherever a shortest path ran through v. The result is a layered graph: residential streets at the bottom, arterials in the middle, highways at the top.

Query. Run bidirectional Dijkstra from source and destination simultaneously, but with one constraint: each search only follows edges that lead to more important nodes. The two searches meet near the top of the hierarchy. Because you never explore small side streets when a route uses highways, the query visits only a few hundred nodes and finishes in under a millisecond on a continental graph.

Contraction Hierarchies: three bands (residential/arterial/highway), forward search from source and backward search from destination both moving upward, meeting at a highway node. Dashed shortcut edges shown. Forward search from the origin and backward search from the destination both climb toward the highway layer and meet in the middle. No residential streets get explored for a cross-country route. This is why it's fast.

The actual travel time on each edge is not static. Before the query runs, the routing service fetches current speed estimates for relevant road segments from the Traffic Store. The same Contraction Hierarchies query runs, but with traffic-adjusted weights. Preprocessing the graph once, updating edge weights at runtime. The hierarchy stays intact; only the costs change.

Your Phone is a Traffic Sensor (Surprise)

Google Maps knows you're in traffic without asking. Every device running Google Maps with location sharing enabled sends anonymous GPS probes every few seconds: timestamped latitude/longitude. At scale, this is tens of billions of probes per day. You are crowdsourcing traffic data every time you navigate somewhere. Hope that is fine with you.

The traffic system is a streaming aggregation pipeline, not a query system. The data flows in one direction. Fast.

Traffic pipeline: devices post GPS probes to Ingestion API, into Kafka, through Map Matcher, Speed Aggregator, into Traffic Store, read by Routing Service at query time. Below: dashed arrow to ETA Model. Phone sends GPS coordinates. System figures out which road segment you're on, how fast you're moving, and updates a key-value store. Routing service reads from that store at query time. The ETA model gets the full historical picture.

The stages:

Ingestion. Devices POST probes to the Location API. Identifiers are anonymized before leaving the device. The server validates and enqueues into Kafka for durable buffering.
Map matching. A stream processor (Flink or equivalent) snaps each noisy GPS coordinate to the nearest road segment using the road graph. This is the hard part. GPS is noisy. Tunnels create gaps. A coordinate 30 meters off could be on a completely different street. The map matcher has to make a probabilistic call every time, at billions-of-events-per-day scale.
Speed aggregation. Per road segment, the system computes a moving-window average speed from probes in the last few minutes. Edges with no fresh data fall back to historical baselines for that time of day and day of week.
Serving. Speed estimates are written to a low-latency key-value store keyed by edge ID. The routing service reads from it at query time.

ETA prediction sits on top of all of this. Google partnered with DeepMind to train Graph Neural Networks on billions of historical trips, incorporating weather, turn density, and traffic signal counts. For an interview answer, you do not need to explain how GNNs work. Knowing this layer exists and what it consumes is enough. Say "ML-based ETA model on top of the aggregated speeds" and move on.

"Coffee Shops Near Me" is Two Searches, Not One

Finding "coffee shops near me" requires two searches at once: a geospatial filter (things within radius R of my location) and a relevance ranking (which ones actually match "coffee shop").

The geospatial layer uses Google's S2 geometry library, which maps the Earth's spherical surface onto a unit cube and subdivides each face into a hierarchical grid of cells. At level 16, each S2 cell covers roughly 1 square kilometer. To search within 2 km of a point, you generate a covering of S2 cells over that circle (typically 4 to 10 cells at mixed levels) and query for all places whose S2 cell ID falls in that set. Since cell IDs are 64-bit integers, this is a fast range scan on a sorted index.

City map with user location, 2km radius circle covered by 6 S2 cells labeled with hex IDs, arrow pointing to sorted place index, then text ranking stage below The spatial query narrows 200 million businesses worldwide down to a few thousand candidates. Text ranking picks the best ones from there. Two stages, deliberately separate.

The spatial query narrows candidates from 200 million businesses worldwide down to a few thousand. A text ranking model then scores those candidates on query match, rating, distance, and popularity. The two stages are deliberately separate. Spatial filtering is exact and fast. Relevance ranking is approximate and tunable. Mixing them together would make both worse.

Four Endpoints, No Magic

Four endpoints cover the core journeys:

GET /v1/tiles/{z}/{x}/{y}
  → 200  image/png (or application/vnd.mapbox-vector-tile)

GET /v1/places?q={text}&lat={lat}&lng={lng}&radius={meters}
  → 200  { places: [{ id, name, location, category, rating }] }

GET /v1/route?origin={lat,lng}&dest={lat,lng}&mode={driving|walking|transit}
  → 200  { steps: [...], duration_sec, distance_m, polyline }

POST /v1/location
  body: { device_id_hash, lat, lng, timestamp, speed }
  → 204  (fire-and-forget)

The tile endpoint is GET and cacheable. The location endpoint is POST and fires-and-forgets. The device does not wait for a 200. It moves on. Both endpoints need to be cheap because each runs billions of times per day.

Three Storage Shapes, Not One Database

Do not put everything in Postgres and call it a day. Three distinct storage shapes:

Road graph. Nodes (node_id, lat, lng) and directed edges (from_node, to_node, distance_m, road_class, base_speed_kmh). Stored in a custom binary format per routing partition tile, not a general-purpose database.
Places. (place_id, name, category, lat, lng, s2_cell_id_l16, rating, review_count). Indexed on (s2_cell_id_l16, category) for geospatial queries, plus a full-text index on name.
Map tiles. Object storage keyed by (z, x, y). No relational structure. These are blobs.

Raise These Tradeoffs Before the Interviewer Asks

A good interviewer will probe each of these. Get there first.

Graph currency vs. query speed. Contraction Hierarchies require a preprocessing step that takes hours on a global graph. When a road closes, you cannot re-preprocess the hierarchy instantly. Google uses a two-tier approach: the static CH handles baseline routing, and a live-traffic overlay adjusts edge weights at query time without reprocessing the hierarchy. The shape stays fixed; the weights change.

Offline maps. Downloading a region bundles the routing tiles, map tiles, and places for a bounding box. The routing algorithm runs locally on device. This works. Data freshness is the hard problem: how do you push updates to a device that is, by definition, offline?

Privacy vs. traffic fidelity. More frequent probes mean better traffic data but more granular location tracking. On-device aggregation (reporting segment speeds rather than raw GPS traces), probe subsampling, and differential privacy noise are the standard mitigations. The server could poll for location rather than devices pushing, but at scale device-initiated push is far cheaper on the server. The push vs. pull tradeoff has a direct parallel here, if you want to use that language in the interview.

The Time Budget (Stick To It)

Horizontal timeline: 0-5 clarify, 5-10 name subsystems, 10-22 map tiles deep dive, 22-37 routing deep dive, 37-43 traffic breadth, 43-45 tradeoffs. Warning note: saying Dijkstra instead of Contraction Hierarchies ends the interview. The time budget. Routing gets the most time because Contraction Hierarchies is the signal the interviewer is listening for. Everything else is context.

The mistake that ends interviews: spending 20 minutes on a generic three-tier web app and then saying "the routing service runs Dijkstra." Say Contraction Hierarchies. Explain the preprocessing step. That specificity is the whole game.

You will not finish all five subsystems. That is expected and fine. What the interviewer is watching for: did you scope correctly, go deep with precision on two systems, and say out loud what you chose to skip and why?

The Six Things You Cannot Afford to Forget

Name all five subsystems in the first few minutes, then commit to two.
Map tiles are pre-computed, immutable, and served from CDN. Deterministic URLs make them perfectly cacheable.
Routing uses Contraction Hierarchies. Bidirectional upward search on a preprocessed hierarchy terminates in milliseconds on a continental graph. Raw Dijkstra does not scale.
Traffic is a streaming pipeline: GPS probes, map matching, moving-window speed aggregation, live edge weights. Separate the ingestion concern from the query concern.
Place search is two stages in sequence: S2 geospatial filter to narrow candidates, text ranking to sort them.
Raise tradeoffs before the interviewer asks. Graph currency, offline mode, and privacy are all fair game.

Knowing the architecture and explaining it clearly under time pressure with someone asking follow-up questions are two completely different skills. You can read this article and understand every diagram. Then you get on a call and describe a "database of roads and Dijkstra" because that is what comes out when you are nervous. SpaceComplexity runs voice-based system design and DSA mock interviews with rubric feedback across communication, problem-solving, and technical depth. It is the practice environment that actually matches the conditions you're preparing for.