Ticketmaster System Design Interview: Sell Millions of Seats Without Selling the Same One Twice

May 27, 202612 min read
interview-prepcareerdsaalgorithms
Ticketmaster System Design Interview: Sell Millions of Seats Without Selling the Same One Twice
TL;DR
  • Two consistency models in one system: CP (strong consistency) for booking and AP (high availability) for browsing keep correctness and scale from conflicting.
  • Redis SET NX EX lock: atomic and auto-expiring, it prevents double-booking without holding a database connection open for the full checkout window.
  • Postgres UNIQUE INDEX is your final guarantee: the database constraint catches any race the application lock misses, so you never depend on a single layer.
  • Virtual waiting room via Redis sorted set: meters millions of concurrent users into the Booking Service at a controlled rate, with position updates pushed via Server-Sent Events.
  • Two-phase booking separates inventory from payment: a 10-minute hold reserves the seat while the user pays, and idempotency keys ensure no card is charged twice.
  • Never cache seat availability: cache event listings and venue layout aggressively, but always read seat status from the Postgres primary.

On November 15, 2022, Ticketmaster received 3.5 billion requests in a single day for Taylor Swift's Eras Tour presale. Fourteen million users and bots competed for 2.4 million tickets. The system buckled. Senate hearings followed.

The interesting part isn't the traffic. Traffic is a scaling problem, and scaling problems have well-understood solutions. The interesting part is the seat. Two users see seat A7 as available. Both click it at the same moment. Only one can have it. And somewhere between that click and the payment confirmation, you have to make that guarantee hold under millions of concurrent requests.

That's the real Ticketmaster system design interview. Here's how to answer it.

A VAX 11/780 minicomputer with text describing the Ticketmaster system: C and assembler on ancient VMS hardware, developers long dead or retired, and a room where a goat is left every two weeks

From r/ProgrammerHumor. The actual Ticketmaster codebase, reportedly. The goat handles cache invalidation.


Scope the Problem First

A system design interview lives or dies on how well you bound it. Spend the first three to five minutes here.

Functional requirements:

  • Browse events by date, location, and category
  • View a venue seat map with real-time availability
  • Reserve specific seats, held temporarily while the user pays
  • Complete purchase via integrated payment
  • Receive a confirmation and digital ticket

Non-functional requirements. These drive every architectural choice:

  • Strong consistency for reservations. No double-booking, ever.
  • High availability for browsing. Users can always search.
  • Low latency for the seat map display (under 200ms).
  • Massive read-to-write ratio: roughly 99% browse, 1% buy.
  • Spike tolerance: one popular event can generate 10x normal traffic in seconds.

You're accepting a CP design (consistent, partition-tolerant) for the booking path and an AP design (available, partition-tolerant) for the search path. Say that out loud. Interviewers respond well when you name the CAP tradeoff before drawing a single box.


Numbers That Drive the Design

Don't over-engineer these. Show you can think at scale.

  • 10 million daily active users
  • 100,000 events live at any moment; up to 50,000 seats per venue
  • Seat record: ~200 bytes → 50,000 seats × 100,000 events = ~1 TB of seat data
  • Peak booking load during a major on-sale: 1,000+ concurrent reservation attempts per second against a single event

The reads are enormous and mostly cacheable. The writes are rare but must be exact.


Three Traffic Paths, Three Consistency Budgets

Not all requests are equal. Browse traffic can tolerate stale data by hours. Seat reservations cannot tolerate stale data by milliseconds.

Architecture diagram showing three separate traffic paths: the browse path (AP, cached, via CDN and read replicas), the seat map path (reads primary directly), and the booking path (CP, via Redis NX lock and Postgres primary with UNIQUE INDEX)

Two consistency models, one system. The browse path is generous with staleness. The booking path trusts exactly one thing: the Postgres primary.

Browse path (high volume, stale-ok): requests hit the CDN first, then read replicas and a Redis cache for event listings. Cache TTLs are generous: 24 hours for event details, one hour for show schedules.

Seat map path (medium volume, freshness needed): queries the Postgres primary for seat status. You never cache seat availability here. You'll see why in a moment.

Booking path (low volume, must be exact): goes through the Booking Service to the Postgres primary, with Redis mediating the seat lock. For major on-sale events, a Virtual Queue sits in front of the Booking Service as a traffic buffer.


A Schema That Cannot Double-Book

Keep the schema tight. Define enough to show you've thought about cardinality and the constraints that prevent double-booking.

-- One record per event (concert, game, show) events ( event_id BIGINT PRIMARY KEY, title VARCHAR(255), venue_id BIGINT, start_time TIMESTAMP, status ENUM('ACTIVE', 'CANCELLED', 'SOLD_OUT') ) -- One record per physical seat per event seats ( seat_id BIGINT PRIMARY KEY, event_id BIGINT REFERENCES events(event_id), section VARCHAR(20), row VARCHAR(10), number INT, status ENUM('AVAILABLE', 'RESERVED', 'CONFIRMED'), reserved_by BIGINT, -- user_id, null when available reserved_at TIMESTAMP, -- when the hold started INDEX (event_id, status) ) -- Permanent record of a completed purchase bookings ( booking_id BIGINT PRIMARY KEY, user_id BIGINT, event_id BIGINT, status ENUM('PENDING', 'CONFIRMED', 'CANCELLED'), created_at TIMESTAMP ) -- Which seats belong to which booking booking_seats ( booking_id BIGINT, seat_id BIGINT, PRIMARY KEY (booking_id, seat_id), UNIQUE INDEX (seat_id) -- one seat per booking, globally )

The UNIQUE INDEX (seat_id) on booking_seats is your final guarantee. Even if the application layer has a race condition, the database rejects the duplicate row. State this explicitly. It's belt-and-suspenders architecture, and interviewers notice when you say it.


Two Phases on Purpose

Three endpoints carry the booking flow.

# Discovery
GET /events?city=NYC&date=2026-06-01
  → paginated event list

# Seat map
GET /events/{event_id}/seats
  → each seat's section, row, number, status

# Reserve seats (starts the 10-minute hold)
POST /bookings
  body: { event_id, seat_ids: [123, 124], user_id }
  → { booking_id, expires_at }

# Confirm purchase
POST /bookings/{booking_id}/confirm
  body: { payment_token }
  → { confirmation_number, ticket_url }

Seat inventory is cheap. Charging a card twice is not. The two-step flow (reserve, then confirm) separates the inventory hold from payment risk. The hold window (typically 10 minutes) is your checkout clock.


The Hard Problem: Preventing Double-Booking

Two users see seat A7 as AVAILABLE. Both click it at the same millisecond. Both read the same row. Without coordination, both write a booking. Your database is not psychic. It just does what it's told.

Three approaches. One winner.

Pessimistic locking uses SELECT ... FOR UPDATE to lock the database row the moment a user selects a seat. Safe. But each lock holds a connection for the entire checkout duration. A 10-minute payment flow holding a database connection is expensive at scale, and it doesn't compose across microservices.

Optimistic locking adds a version column. When you update, you assert the version hasn't changed since your read. If another user committed first, your transaction rolls back and you retry. Works fine at low contention. At high contention (1,000 people fighting for the same row), you get 999 failed transactions and a retry storm that amplifies load at exactly the worst moment.

Distributed lock via Redis. Use this.

# Redis SET with NX (only if Not eXists) and EX (expiry in seconds) lock_key = f"seat:{event_id}:{seat_id}" acquired = redis.set(lock_key, user_id, ex=600, nx=True) if not acquired: return {"error": "Seat is no longer available"} # Lock acquired: write RESERVED status to Postgres db.execute(""" UPDATE seats SET status = 'RESERVED', reserved_by = %s, reserved_at = NOW() WHERE seat_id = %s AND status = 'AVAILABLE' """, [user_id, seat_id])

Diagram showing two users racing for seat A7. User A wins the Redis SET NX EX operation and proceeds to write to Postgres. User B gets nil back immediately and receives a 409. A second layer shows the Postgres UNIQUE INDEX catching anything that slips through if Redis fails.

Layer 1 (Redis) is fast. Layer 2 (Postgres UNIQUE INDEX) is correct. You need both because neither alone is sufficient.

The Redis lock is temporary. The Postgres record is the truth. The NX flag makes the SET atomic. Only the first caller succeeds; every subsequent caller gets nil back immediately without waiting. Redis hash operations run in sub-millisecond time, and the TTL is automatic. No cron job needed to release expired locks.

When the 10 minutes expire and the user hasn't paid, the Redis key disappears. A background job scans for seats with status = 'RESERVED' and reserved_at < NOW() - INTERVAL 10 MINUTE and resets them to AVAILABLE.

The Redis lock is a performance optimization. The database constraint is a correctness guarantee. Two concurrent writes race to the database, but the second hits the unique constraint and returns a clean error. The Redis lock kept the database pressure low. The constraint kept the data correct. You don't depend on either alone.


How You Survive a Taylor Swift On-Sale

For ordinary events, the Booking Service handles load fine. For a Taylor Swift on-sale, 14 million concurrent users arrive in the first minute. You cannot scale to that. You should not try. The correct answer is to queue them.

The waiting room meters traffic into the Booking Service. Users are placed in a FIFO queue on arrival. The system admits a controlled batch per second, sized to what the Booking Service can actually process without degrading.

User arrives → assigned a queue token
            ↓
Redis sorted set (ZADD queue_key <arrival_timestamp> <user_token>)
            ↓
Admission controller checks ZRANK queue_key <user_token>
            ↓
When rank < admission_threshold → issue access token → booking page

Diagram showing the virtual waiting room flow: 14 million users arrive and enter a Redis sorted set queue. An admission controller drains at roughly 1000 users per second using ZRANK checks. Admitted users flow to the Booking Service. Server-Sent Events push queue position updates back to each user's browser.

The queue is the system. Without it, 14 million people crush the Booking Service in the first 30 seconds. With it, they wait in an orderly line and the Booking Service stays healthy.

Redis sorted sets give O(log N) insert and rank lookup. Your queue holds millions of entries. Position updates are pushed to the client via Server-Sent Events (not polling). SSE is the right call here because you only need server-to-client communication. WebSockets are bidirectional and cost more.

The admission rate is a throttle. Size it to your Booking Service throughput. During high-demand events, show users an estimated wait time based on their queue position and the current drain rate. Honest feedback is better than a spinning loader that never resolves. Congress agreed on that one.


Never Charge a Card Twice

Network timeouts happen. A user might retry a payment request after their connection drops. You need a guarantee that you never charge twice.

Every payment attempt gets a unique idempotency key, scoped to that booking.

key: booking_R789456_attempt_1697123456

Before charging, check if this key exists in the payments table. If it does and succeeded, return the cached result. If it failed, allow the retry. Major payment processors (Stripe, Adyen) support idempotency keys at the API level, so the guarantee extends through to the processor.


Cache Everything Except Seat Status

Never cache seat availability. Cache everything else aggressively.

  • Event listings: 24-hour TTL in Redis. Invalidate on event update.
  • Venue layout (static seat positions, sections): CDN-eligible, very long TTL.
  • Seat status: never cached. Always read from Postgres primary.

The seat map view can tolerate a one to two second lag for visual rendering. What cannot tolerate lag is the reservation check. That always goes to the primary. Read more about LRU cache design if you want to think through what to evict when cache pressure builds.


The Ticketmaster System Design Interview: Your 45-Minute Clock

Rough breakdown that works:

TimeActivity
0-5 minRequirements and scope, state CAP tradeoff
5-10 minCapacity estimates, high-level sketch
10-20 minData model, API design
20-35 minDouble-booking problem (Redis lock + DB constraint)
35-42 minVirtual queue, caching, payment idempotency
42-45 minTradeoffs, what you'd improve with more time

The double-booking section is where you differentiate yourself. Most candidates wave at "use a database transaction." The follow-up is always: what if Redis goes down? The answer is your Postgres unique constraint catches it. You don't rely on any single layer. That's the story.


Four Tradeoffs Worth Stating Out Loud

Consistency vs. availability on the booking path. You chose CP. A user who can't book during a brief degradation is better than two users who both believe they own the same seat. Say this directly.

Pessimistic vs. distributed locking. Pessimistic locking is simpler but ties up database connections. Distributed locks via Redis are faster under high concurrency and expire automatically. The tradeoff is operational: you now have Redis as a critical dependency in the booking path.

Waiting room vs. crashing. A virtual queue creates friction. Crashing creates distrust. The Eras Tour taught the industry which is worse. The queue also provides fairness guarantees you cannot get from raw first-come-first-served at the database level.

Two-phase booking vs. one-step checkout. Two phases let you hold inventory without charging. The cost is a window of "limbo" seats (reserved but unpaid). At 1,000 concurrent active checkouts with 10-minute holds, that's up to 1,000 temporarily unavailable seats. For most events that's acceptable. For a sold-out 5,000-seat venue, you'd want to reduce the hold window. Read through the tradeoff maze if you want a framework for communicating these kinds of push-pull decisions.


The Full Design in Six Lines

  • Two traffic paths, two consistency models: CP for booking, AP for browse.
  • Redis distributed locks (SET NX EX) provide fast, auto-expiring seat holds.
  • The UNIQUE INDEX in Postgres is your final correctness guarantee, not the application lock.
  • A virtual waiting room (Redis sorted set, leaky bucket admission) is how you survive a Taylor Swift sale.
  • Two-phase booking (reserve, then pay) separates inventory hold from payment risk.
  • Never cache seat availability. Cache everything else.

If you want to practice walking through this kind of design out loud, under time pressure, and getting specific feedback on where your explanation lost the interviewer, SpaceComplexity runs voice-based mock system design sessions with rubric-based scoring. The gap between knowing this design and explaining it fluently in 45 minutes is real. It's a trainable skill.


Further Reading