Design a Hotel Booking System: The Airbnb Interview, Step by Step

May 27, 202611 min read
interview-prepcareerdsaalgorithms
Design a Hotel Booking System: The Airbnb Interview, Step by Step
TL;DR
  • Two separate paths: search (Elasticsearch, eventual consistency) and booking (Postgres, strong consistency) are fundamentally different problems with opposite requirements
  • One row per (room_id, date) with a version field gives you row-level optimistic locking that prevents double bookings without serializing all concurrent requests
  • Soft hold first: Phase 1 places a RESERVED status via a conditional UPDATE; Phase 2 calls payment and either confirms or releases the hold via a compensating transaction
  • CDC pipeline (Postgres WAL → Debezium → Kafka → Elasticsearch) keeps the search index near-real-time without coupling search reads to the booking database
  • Redis DECR gate in front of Postgres absorbs burst traffic on hot listings before contention reaches the database
  • Idempotency keys on every booking creation and payment call prevent duplicate records from network retries on mobile clients

The hard part isn't designing the listing page. Any engineer can sketch that.

The challenge is recognizing that a hotel booking system is actually two completely different systems with opposite requirements. Search tolerates stale data and needs speed. Booking cannot tolerate inconsistency and needs correctness. Conflating them is what gets most candidates stuck. It's also what causes the occasional double-booking in production, but that's someone else's pager alert.

Scope It Before You Draw Anything

Spend the first five minutes here. Yes, even if you've practiced this exact question. A candidate who jumps straight to databases signals they haven't thought about the problem. An interviewer watching you draw tables before clarifying requirements will interrupt.

Functional requirements to confirm:

  • Hosts create and manage property listings with availability calendars
  • Guests search by location, dates, and guest count
  • Guests book and pay; confirmation appears within 3 seconds
  • Cancellation and refund flows
  • Reviews after checkout

Ask whether instant booking is the default or if hosts manually approve each request. That one question changes the booking state machine significantly. Instant booking means your service makes the decision. Manual approval means you hold inventory while the host responds, with a timer.

Non-functional requirements:

  • Search under 200ms P99
  • Zero double bookings (strong consistency on inventory)
  • 99.99% uptime for the booking path
  • 7 million active listings, 100 million MAU

Two Numbers That Drive the Design

The read/write ratio for this system is extreme. Roughly 1,000 reads happen for every write. Guests browse dozens of listings before committing to one booking.

Scale estimate:

  • 10M daily active users, 10 searches per user per day = 100M searches/day = ~1,200 QPS average, ~6,000 QPS peak
  • 1M bookings per day = ~12 bookings/sec average, ~60/sec peak
  • 7M listings at 10 photos each, ~200KB per photo = ~14TB for images
  • Listing data read:write ratio is approximately 1,000:1

These two numbers force two separate paths with opposite consistency models. Search is high-throughput and latency-sensitive. Booking is low-throughput and correctness-critical.

Five Services, Two Paths

Hotel booking system two-path architecture: read path through Search Service and Elasticsearch, write path through Booking Service and PostgreSQL, with CDC pipeline and async notification Read and write paths split at the API gateway. CDC keeps Elasticsearch in sync within seconds of every write.

Five services carry the load:

Listing Service: Hosts create, update, and delete properties. Writes go to Postgres. A CDC pipeline (Debezium reads the Postgres write-ahead log, publishes to Kafka, and an Elasticsearch consumer updates the search index) syncs listing data near-real-time.

Search Service: Query-only. Reads from Elasticsearch. No writes. Handles geo queries, filters, and ranking.

Booking Service: The consistency-critical path. Reads and writes to Postgres with row-level locking. This service prevents double bookings.

Payment Service: Calls an external gateway such as Stripe or Adyen. Every request carries an idempotency key.

Notification Service: Async Kafka consumer. Sends email and push confirmations after booking events.

Never route binary assets through your application tier. The CDN sits in front of S3 for all images and removes that entire traffic class from your services with no application code changes.

What Lives in the Database

-- One property per listing properties (id, host_id, title, lat, lng, address, amenities, max_guests, created_at) -- Rooms within a property (one for Airbnb-style, many for hotel-style) rooms (id, property_id, name, max_occupancy, price_per_night_cents) -- One row per (room, date). The source of truth for availability. room_inventory (room_id, date, total_count, reserved_count, version) -- The booking record bookings ( id, user_id, room_id, check_in, check_out, status, -- PENDING | RESERVED | CONFIRMED | CANCELLED total_price_cents, idempotency_key, -- unique constraint expires_at, created_at ) -- Payment record payments (id, booking_id, amount_cents, status, external_ref, idempotency_key)

Hotel booking system entity-relationship diagram: properties to rooms one-to-many, rooms to room_inventory one-to-many with one row per date, rooms to bookings one-to-many, bookings to payments one-to-one Every table in the booking path. The version field on room_inventory is how optimistic concurrency control works.

One row per (room_id, date) is the key design choice. It gives you row-level locking at exactly the right granularity. Two simultaneous booking attempts for the same room on the same night compete for the same row, and Postgres handles the serialization. The version field enables optimistic concurrency control. The reserved_count tracks soft holds; total_count is the hard cap.

How the API Looks

GET  /api/v1/search?lat=&lng=&radius_km=&checkin=&checkout=&guests=&page=
     → { listings: [...], next_cursor }

GET  /api/v1/properties/{id}
     → { property, rooms, next_available }

POST /api/v1/bookings
     body: { room_id, checkin, checkout, guests, idempotency_key }
     → { booking_id, status: "RESERVED", expires_at }

POST /api/v1/bookings/{id}/pay
     body: { payment_method_id }
     → { booking_id, status: "CONFIRMED" }

DELETE /api/v1/bookings/{id}
     → { status: "CANCELLED" }

The idempotency_key is not optional. When a mobile client retries after a dropped connection, you need to know whether the original request landed. Store it with a unique constraint and return the cached response on duplicate keys. Without it, a network hiccup creates two bookings for the same room.

The Booking Flow: Where Candidates Lose Points

Hotel booking state machine: PENDING to RESERVED on Book Now click, RESERVED to CONFIRMED on payment success, RESERVED or CONFIRMED to CANCELLED on timer expiry or payment failure The 15-minute hold is the entire design. Everything else is bookkeeping.

This is a two-phase process. Interviewers are looking for exactly this detail, so walk through each step.

Phase 1: Reserve. When the guest clicks "Book Now", Booking Service places a soft hold:

BEGIN; UPDATE room_inventory SET reserved_count = reserved_count + 1, version = version + 1 WHERE room_id = ? AND date BETWEEN checkin_date AND checkout_date AND (total_count - reserved_count) > 0 AND version = ?; -- optimistic lock: fail if another transaction snuck in -- If rows_affected < nights requested → conflict → ROLLBACK → return 409 INSERT INTO bookings (..., status = 'RESERVED', expires_at = NOW() + INTERVAL '15 min'); COMMIT;

Phase 2: Pay and confirm. The guest submits payment. Payment Service calls the gateway with the booking's idempotency key. On success, Booking Service sets status to CONFIRMED. On failure or TTL expiry, a background worker decrements reserved_count and sets status to CANCELLED.

Optimistic concurrency control race condition timeline: Request A and Request B both read version 5, Request A's UPDATE matches and commits, Request B finds version 6 and gets rows_affected=0, rolls back with 409 Conflict Request A wins the race. Request B sees a stale version, gets zero rows affected, and rolls back. No double booking. Postgres does all the hard work.

Why not pessimistic locking? SELECT ... FOR UPDATE serializes all concurrent requests for the same room into a queue. Under load, that queue grows and latency degrades. Optimistic locking lets requests proceed concurrently, fails conflicts fast, and only serializes on actual races. When most booking attempts succeed on the first try, optimistic is the right default.

Why not two-phase commit across services? 2PC requires a coordinator. If that coordinator crashes between prepare and commit, every participant blocks. Indefinitely. That's technically correct behavior and also your on-call nightmare. The Saga pattern is safer. Each step has a compensating transaction. Payment fails and the hold releases. Each service recovers independently.

Search Is a Different Problem

Guests enter a city, dates, and guest count. Speed and relevance matter here. Exact consistency does not.

Search Service queries Elasticsearch:

{ "query": { "bool": { "must": [ { "geo_distance": { "distance": "50km", "location": { "lat": 37.77, "lon": -122.41 } } }, { "range": { "max_guests": { "gte": 2 } } }, { "term": { "available_checkin": "2026-06-01" } } ] } } }

Elasticsearch handles geo_distance queries natively. The availability index is synced from Postgres via the CDC pipeline described above. That sync has a lag of a few seconds.

Search results may show a listing as available that was booked two seconds ago. That's fine. You'll know soon enough. Postgres is the source of truth. Show a friendly error on the booking attempt when the inventory is gone. Don't make guests wait 200ms while you do a real-time consistency check across every search result.

Hotel booking search architecture with CDC sync pipeline: client to Search Service to Elasticsearch for reads, Postgres WAL to Debezium to Kafka to ES Consumer to Elasticsearch for sync, Redis cache for hot listings The CDC pipeline keeps search snappy without touching Postgres on every query. Redis sits in front for the listings that get hammered.

Cache hot listings in Redis, using LRU eviction to automatically drop the least-recently-accessed entries. A listing with 5,000 views per hour doesn't need 5,000 Postgres reads. Cache property details, pricing, and availability for the next 30 to 90 days. Invalidate on booking confirmation by publishing a cache-bust event to Kafka.

Where the System Breaks Under Load

Popular listings under peak demand. A viral listing can attract 10,000 booking attempts per minute for the last available night. Congratulations to one of them. Optimistic concurrency handles correctness: one request wins per date, the rest get 409. For very high contention, put a Redis DECR counter in front of Postgres. Set room:{id}:{date}:available on listing creation. Decrement atomically before attempting the SQL. If the counter returns negative, reject early without touching Postgres at all. Redis absorbs the burst; Postgres holds the durable ledger.

Search cluster capacity. Shard Elasticsearch by region: US, EU, APAC. Route queries to the nearest shard. Geo queries are naturally local, so most hit a single shard rather than triggering cross-shard scatter-gather.

Inventory table hotspots. Partition room_inventory by room_id % N. Most booking contention is within a single room, so this distributes rows across shards without requiring cross-partition writes.

Images. Every listing photo goes through S3 plus a CDN. Binary asset traffic never touches your application tier.

Tradeoffs Worth Discussing Out Loud

(For the underlying patterns behind these choices, The Tradeoff Maze covers push-vs-pull and read-vs-write tradeoffs in distributed systems.)

DecisionOption AOption BRecommended
Booking consistencyStrong (Postgres ACID)Eventual (Redis)Strong: zero tolerance for double bookings
Concurrency controlOptimistic (version field)Pessimistic (FOR UPDATE)Optimistic at scale; pessimistic for simpler systems
Distributed transactionsSaga pattern2PCSaga: no coordinator single point of failure
Search indexElasticsearchPostGIS on PostgresES for scale and richer filter and ranking support
Booking hold storagePostgres row with expires_atRedis key with TTLPostgres: durable across crashes, simpler recovery logic

The answer that separates a strong hire: naming the consistency split before the interviewer asks. Say it early in the interview. "Search uses eventual consistency. Booking uses strong consistency. These are separate read and write paths, and I'll design them differently." Most candidates conflate them and spend 20 minutes trying to fix the wrong problem.

If you want to practice talking through this under real time pressure with rubric-based feedback, SpaceComplexity runs voice-based mock interviews that score your structure, depth, and tradeoff reasoning the way a real interviewer does.

The 45-Minute Clock

T+00 to T+05  Clarify requirements. Confirm instant vs. request-based booking. (Yes, really five minutes.)
T+05 to T+10  Estimate scale. Name the 1000:1 read/write split explicitly.
T+10 to T+18  High-level diagram. Name five services, label both paths.
T+18 to T+32  Data model and full booking flow: Phase 1 hold, Phase 2 payment, recovery.
T+32 to T+40  Search architecture and CDC pipeline.
T+40 to T+45  Bottlenecks, tradeoffs, and interviewer Q&A.

Don't skip requirements. Jumping straight to databases signals you haven't thought about the problem. Five minutes up front pays off through the entire interview.

Hotel Booking System Design: Key Decisions

  • Two paths: search (Elasticsearch, eventual consistency) and booking (Postgres, strong consistency)
  • Inventory model: one row per (room_id, date) with a version field for optimistic locking
  • Booking flow: soft hold via conditional UPDATE → payment → confirm or release via compensating transaction
  • Double booking prevention: conditional UPDATE on the version column, not application-level checks
  • Search: Elasticsearch with geo_distance queries, availability synced via Debezium CDC pipeline from Postgres
  • Caching: Redis with LRU eviction for hot listings; CDN for images
  • Idempotency keys on booking creation and every payment call

Further Reading