Flash Sale System Design: Sell 10,000 Units to 10 Million Users

You have 10,000 units of a product. Ten million people want one. The sale starts in three seconds.

This is the flash sale system design problem, and it's one of the most instructive questions in the interview canon because it forces you to confront something most systems never have to worry about: the happy path handles 0.1% of traffic and the rejection path handles 99.9% of it. Optimize for rejection, not fulfillment. Your system is, in the most literal sense, a very fast machine for saying no.

What follows is every layer of the design in the order a 45-minute interview should cover it.

What the Interviewer Is Actually Testing

Flash sale questions aren't inventory CRUD dressed up. They are inventory CRUD with a 10-million-person audience and the explicit design requirement that 99.8% of that audience leaves empty-handed. The interviewer is testing three specific things:

Can you identify the core race condition and reason about it precisely?
Do you know how to shed load early, before it reaches expensive resources?
Can you design for the failure case (99.8% of requests fail) as rigorously as the success case?

Get those three right and the rest is detail. The rest is still a lot of detail.

Step 1: Clarify Requirements (Minutes 0-5)

Before drawing a single box, pin down scope. Flash sales aren't all the same.

Functional requirements to nail down:

Users can browse the sale page before the start time
At sale start, users can attempt to purchase up to a per-user limit (typically 1 or 2 units)
The system must not sell more than the available inventory (no overselling, ever)
If a user claims inventory but does not complete payment within a timeout, that inventory returns to the pool
Each user can only place one successful order per sale event

Non-functional requirements:

Availability: 99.99% during the sale window. Downtime during the sale is a direct revenue loss
Consistency: Strong consistency on inventory. Eventual consistency on everything else (catalog, user profiles)
Latency: Rejection responses in under 50ms. Successful order acceptance in under 200ms
Scale: 10M concurrent users, 10K inventory units, 30-second sale window

Scale estimation to say out loud:

Peak page views:    500,000 req/sec  (10M users view the page over a ~20s ramp)
Peak buy attempts:  200,000 req/sec  (roughly half click Buy)
Successful orders:    ~333/sec       (10,000 units / 30s window)
Read-to-write ratio:  1,000:1        (enormous, CDN caching is mandatory)

The most important number: 99.8% of buy attempts will fail. Your system must fail them cheaply.

Step 2: High-Level Architecture (Minutes 5-15)

The right mental model is a funnel of progressively smaller, more expensive gates. Each layer drops traffic by an order of magnitude before passing what's left to the next.

Flash sale architecture: six-layer traffic funnel from CDN edge to PostgreSQL, each gate shedding load before passing requests downstream The funnel. 10 million users enter at the top. 10,000 orders exit at the bottom. Everyone else gets a fast no.

Each layer must be able to reject a request without talking to the layer below it. The CDN rejects stale reads. The rate limiter stops bots without touching the waiting room. The waiting room absorbs overflow without touching inventory. The Inventory Service rejects sold-out attempts without touching the database. For a deeper look at the rate limiting layer, Rate Limiter System Design covers token bucket, sliding window, and the Redis Lua patterns at the edge.

Step 3: Data Model (Minutes 15-20)

PostgreSQL tables

-- The sale event itself
CREATE TABLE sales (
    id           TEXT PRIMARY KEY,
    sku_id       TEXT NOT NULL,
    total_stock  INT NOT NULL,
    max_per_user INT NOT NULL DEFAULT 1,
    start_at     TIMESTAMPTZ NOT NULL,
    end_at       TIMESTAMPTZ NOT NULL
);

-- Ground truth for inventory (Redis is the hot path, this is the source of truth)
CREATE TABLE inventory (
    sale_id          TEXT PRIMARY KEY REFERENCES sales(id),
    total            INT NOT NULL,
    reserved         INT NOT NULL DEFAULT 0,
    version          INT NOT NULL DEFAULT 0   -- for optimistic locking
);

-- One confirmed order per (user, sale) pair
CREATE TABLE orders (
    id               BIGSERIAL PRIMARY KEY,
    idempotency_key  TEXT NOT NULL UNIQUE,
    user_id          BIGINT NOT NULL,
    sale_id          TEXT NOT NULL,
    sku_id           TEXT NOT NULL,
    status           TEXT NOT NULL,  -- PENDING_PAYMENT, CONFIRMED, CANCELLED
    amount_cents     BIGINT NOT NULL,
    created_at       TIMESTAMPTZ DEFAULT NOW(),
    UNIQUE (user_id, sale_id)         -- belt-and-suspenders: one order per user per sale
);

The UNIQUE(user_id, sale_id) constraint is your last line of defense. Even if every upstream guard fails, the database will reject a duplicate. It's the database saying: I know what you're doing. No.

Redis keys (populated before the sale starts)

sale:{sale_id}:stock          -- INT, current available inventory
sale:{sale_id}:waitroom       -- SORTED SET, member=user_id, score=arrival_ts
sale:{sale_id}:admitted       -- SET, user_ids allowed through the waiting room
idem:{idempotency_key}        -- STRING, "PENDING" or "DONE" with 1h TTL

Pre-warm rule: Load sale:{sale_id}:stock into Redis at least 5 minutes before the sale starts, not at T=0. A cold cache miss under 200K QPS will cascade. The broader Redis architecture (replication, eviction, consistency tradeoffs) is covered in Distributed Cache System Design.

Step 4: API Design (Minutes 20-22)

GET  /sales/:sale_id
     → 200, cached at CDN for 10s, returns metadata + countdown
     → stock_hint: "low" | "available" (never exact count; see hot key section)

POST /sales/:sale_id/queue
     → enters the waiting room
     → returns { position: 4821, estimated_wait_seconds: 14 }
     → issues a signed JWT with TTL=120s when user is admitted

POST /sales/:sale_id/orders
     Headers: Idempotency-Key: <uuid>, Authorization: Bearer <jwt>
     Body: { sku_id, quantity }
     → 202 Accepted (not 201): order is queued, not yet written
     → returns { order_id, status: "PENDING_PAYMENT" }
     → 409 if idempotency_key already used
     → 410 if JWT expired or already consumed
     → 503 if inventory exhausted

GET  /orders/:order_id
     → poll for CONFIRMED or CANCELLED
     → client polls with exponential backoff (500ms, 1s, 2s, 4s...)

Return 202 Accepted, not 201 Created. The order doesn't exist in the database yet. It's a claim in a queue. This distinction matters for correctness.

Step 5: Preventing Overselling (Minutes 22-32)

This is the core of the interview. Be precise.

The race condition

Without any guard, two concurrent requests can both read stock = 1, both decide "I can proceed," and both write an order. You sold the same unit twice. This bug has been wrecking commerce systems since before most of us started writing software. It is patient. It waits for your launch event.

Thread A: READ stock → 1
Thread B: READ stock → 1
Thread A: WRITE stock → 0, create order for user A  ✓
Thread B: WRITE stock → 0, create order for user B  ✗  (oversold)

Race condition timeline: Thread A and Thread B both read stock=1, then both write orders, producing an oversell Both threads saw stock = 1. Both wrote an order. You now owe someone a product you don't have.

Three approaches exist. Know all three. Recommend the right one for the scale.

Approach 1: Optimistic locking in PostgreSQL

UPDATE inventory
   SET reserved = reserved + 1,
       version  = version + 1
 WHERE sale_id = $1
   AND (total - reserved) >= $2   -- at least $2 units available
   AND version = $3;              -- nobody else modified since I read

-- 0 rows affected → conflict, retry or reject

This works and is correct. It fails gracefully: if the update touches 0 rows, you know someone else got there first. Practical limit: 5,000 to 10,000 writes per second per row before retry storms become a problem. Fine for a sale with a few hundred units. Not fine for 10,000 units at 200,000 concurrent attempts. Two hundred thousand threads simultaneously retrying the same version field is not a distributed system. It is a distributed argument.

Approach 2: Redis Lua atomic decrement (recommended for most scales)

Redis executes each Lua script as a single, uninterruptible operation. No other command runs while the script is executing. This gives you read-modify-write atomicity at Redis speeds: around 100,000 operations per second on a single node.

-- KEYS[1] = "sale:{sale_id}:stock"
-- KEYS[2] = "sale:{sale_id}:admitted:{user_id}"
-- ARGV[1] = quantity requested

local admitted = redis.call("GET", KEYS[2])
if not admitted then
    return -2  -- user not admitted through waiting room
end

local stock = tonumber(redis.call("GET", KEYS[1]))
if stock == nil or stock < tonumber(ARGV[1]) then
    return -1  -- sold out
end

redis.call("DECRBY", KEYS[1], ARGV[1])
redis.call("DEL", KEYS[2])   -- consume the admission token (one use)
return 1       -- success

The Lua script checks three things atomically: is the user admitted, is there stock, and if so, it decrements and revokes the admission token. No race. No oversell. No database touched.

Approach 3: Token gate (for extreme scale or heavy bot pressure)

Before the sale, RPUSH sale:{sale_id}:tokens exactly N times, where N is the inventory count. Each LPOP removes one token. The list can't go below empty. Overselling is mathematically impossible.

# Pre-warm: push exactly N tokens
redis.rpush(f"sale:{sale_id}:tokens", *range(10000))

# At purchase attempt:
token = redis.lpop(f"sale:{sale_id}:tokens")
if token is None:
    return 503, "Sold out"
# token is a unique claim; proceed to order queue

The token gate is the strongest guarantee. The tradeoff: a crashed Redis node before inventory is persisted to the database means those tokens are gone. Use Redis AOF persistence with appendfsync everysec and a sync replica.

Which to use?

Scale	Approach
< 5K concurrent attempts, < 1K units	PostgreSQL optimistic lock
Up to 100K concurrent attempts	Redis Lua decrement
100K+ concurrent attempts or heavy bots	Token gate, possibly sharded
Belt-and-suspenders on all cases	DB UNIQUE constraint as final guard

Step 6: The Virtual Waiting Room (Minutes 32-37)

A waiting room converts a 200,000 req/sec spike into a 20,000 req/sec controlled stream. Without it, your inventory service is the buffet table the moment the doors open. With it, you have an orderly line. A line that cannot break your system. Ticketmaster uses the same pattern for presales: see Ticketmaster System Design for how it handles seat-level contention.

How it works

# User arrives
ZADD sale:{sale_id}:waitroom arrival_timestamp user_id

# Periodic admission loop (runs every 100ms)
# Admit a batch sized to your downstream throughput
batch = ZPOPMIN sale:{sale_id}:waitroom COUNT 2000
for user_id in batch:
    SET sale:{sale_id}:admitted:{user_id} 1 EX 120
    notify_user(user_id, "you_are_admitted")

The sorted set ordered by arrival timestamp ensures FIFO fairness. ZRANK gives each user their queue position in O(log N).

# User polls for position
position = ZRANK sale:{sale_id}:waitroom user_id
estimated_wait = position / admission_rate_per_second

Virtual waiting room: Redis sorted set showing users ordered by arrival timestamp, ZPOPMIN admitting a batch, and admitted users receiving a signed JWT and SSE notification The waiting room. Score = arrival timestamp. ZPOPMIN pops the earliest arrivals. Each admitted user gets a signed JWT valid for 120 seconds.

Admission rate controls everything downstream. Set it to the throughput your inventory service and order queue can absorb. A good starting point: inventory count divided by the sale window in seconds. For 10,000 units over 30 seconds, that's ~333 admissions/sec. Over-admit by 10x (3,300/sec) to account for users who abandon.

Soft capacity ceiling

If the waiting room sorted set exceeds 20 million entries, stop accepting new users: ZCARD sale:{sale_id}:waitroom > 20_000_000 → return 429. Prevents unbounded memory growth.

Step 7: The Hot Key Problem (Minutes 37-40)

A single Redis key for sale:{sale_id}:stock means all 200,000 req/sec pound the same slot. Redis is single-threaded per slot. You will hit the ceiling. The ceiling does not move.

Solution: shard the inventory counter.

sale:iphone15-sale:stock:0  → 625 units
sale:iphone15-sale:stock:1  → 625 units
...
sale:iphone15-sale:stock:15 → 625 units   (16 shards × 625 = 10,000)

Each purchase attempt picks a random shard with random.randint(0, 15). Load distributes across 16 Redis slots. If a shard hits zero, the script tries the next one. If all shards hit zero, the sale is over.

Hot key sharding: single Redis stock key causing bottleneck (left) versus 16 sharded keys each holding 625 units, selected via random.randint(0,15) (right) One key handles 200K req/s on a single Redis thread. Sixteen keys spread that load across 16 slots. Simple math, significant difference.

The UI never shows exact inventory. Show "Available" until 10% remains, then "Low stock." Exact counts invite bots to time their requests to the millisecond.

Step 8: Async Order Processing (Minutes 40-42)

The Inventory Service never writes directly to the database. It publishes to Kafka and returns 202 Accepted. The database never sees the spike.

Inventory Service  →  Kafka topic: order-intents  →  Order Workers  →  PostgreSQL

Order workers consume from Kafka and write with:

INSERT INTO orders (idempotency_key, user_id, sale_id, sku_id, status, amount_cents)
VALUES ($1, $2, $3, $4, 'PENDING_PAYMENT', $5)
ON CONFLICT (idempotency_key) DO NOTHING;

After writing, the worker calls the payment gateway. If payment succeeds: update status to CONFIRMED. If payment fails or times out after 10 minutes: update status to CANCELLED and return the unit to inventory.

# Return inventory on cancellation
INCRBY sale:{sale_id}:stock 1       # restore Redis counter
UPDATE inventory SET reserved = reserved - 1 WHERE sale_id = $1  # restore DB

The reservation TTL means abandoned carts can't permanently drain stock. A user who claims inventory and disappears doesn't block someone who would have bought.

Where This Breaks (and How to Stop It)

Everything breaks eventually. Here is what breaks first and why.

Failure	Impact	Fix
Redis primary loses memory	Inventory count reset to 0 or ∞	AOF `appendfsync everysec`, sync replica, periodic DB reconciliation
DB connection pool saturates	Orders queue up, timeouts	Kafka absorbs burst; DB write rate is bounded by Kafka consumer throughput
Kafka broker unavailable	Orders lost	Producer `acks=all`, outbox table as fallback
Payment gateway slow	Reservations held too long	10-minute TTL + compensating `INCRBY` on timeout
Bot wave drains tokens in 50ms	Legitimate users get nothing	Rate limit per IP + per account age + per device fingerprint before waiting room
Duplicate clicks	Double order	Redis SETNX idempotency + DB UNIQUE constraint

The Redis-DB reconciliation deserves emphasis. Run a periodic job (every 60 seconds during the sale) that compares sale:{sale_id}:stock against inventory.total - inventory.reserved. If they diverge by more than 1%, the Redis counter is authoritative during the sale but the DB is authoritative for audit. After the sale, reconcile and correct.

Flash Sale System Design: 45-Minute Interview Clock

0-5   Clarify: scale (10M users, 10K units), consistency (strong on inventory),
      per-user limit, reservation TTL, multi-region or single
5-15  High-level: draw the 6-layer funnel, name each box, explain why each
      layer exists before the next
15-20 Data model: sales, inventory (with version), orders (dual UNIQUE constraints)
20-32 Core deep dive: the race condition, optimistic lock, Redis Lua script,
      token gate; explain the tradeoff between them
32-37 Waiting room: Redis sorted set, admission rate math, FIFO fairness, capacity ceiling
37-40 Hot key: sharding to N Redis keys, why exact count in UI invites bots
40-42 Async order processing: Kafka 202 pattern, reservation TTL, compensating writes
42-45 Tradeoffs and follow-up questions

If the interviewer pushes for multi-region: each region owns a slice of inventory (US=60%, EU=30%, APAC=10%). A reconciler rebalances unsold stock between regions every 30 seconds. Alternatively, route all DECR calls through one authoritative region and accept the added latency.

Tradeoffs

Decision	Option A	Option B	When to pick A
Inventory write path	Redis Lua	PostgreSQL optimistic lock	> 5K concurrent attempts
Waiting room	Redis sorted set	No waiting room	Any sale where demand > inventory
Order acceptance	202 async	201 synchronous	Latency matters; DB can't absorb the spike
Reservation timeout	10 min	No timeout	Always set a timeout; abandoned carts drain stock otherwise
Inventory sharding	16 Redis keys	1 Redis key	Hot SKU with > 50K req/sec
Multi-region	Proportional allocation	Single authoritative region	Reduces cross-region latency at the cost of rebalancing complexity

The Six Things That Matter

None of these are optional. All of them will be tested.

The core problem is the read-modify-write race. Solve it with a Redis Lua atomic script or a token gate, not with a SELECT FOR UPDATE.
Shed load at the edge before it reaches inventory. CDN for reads, rate limiter for bots, waiting room for legitimate overflow.
Return 202 Accepted. The order doesn't exist yet. Kafka decouples claim from payment.
Two UNIQUE constraints in the database. Idempotency key plus (user_id, sale_id). They're your last line of defense and they cost nothing.
Reservation TTL prevents stock hoarding. If a user doesn't pay in 10 minutes, their unit returns to the pool.
Never show exact stock count. "Available" and "Low stock" are enough. Exact counts are a timing signal for bots.

If you want to practice this kind of design under real interview pressure, SpaceComplexity runs voice-based mock interviews with rubric scoring on communication, problem decomposition, and technical depth.