Design an Inventory Reservation System: The 45-Minute Walkthrough

Two users hit "Buy Now" at exactly the same instant. One item left. Both get a confirmation email. You've oversold your inventory, and now one customer is fuming while the other is unboxing nothing.

Inventory reservation looks simple. It's a concurrency problem wearing a shopping cart disguise.

That's the interview. You're not being asked to build a database schema. You're being asked how you stop two threads from both believing they have the last unit. Everything else follows from that question.

Requirements: What You're Actually Building

Start here before drawing anything. Five minutes of clarification saves you from solving the wrong problem. (You do not want to spend 35 minutes building a globally distributed, strongly-consistent inventory system for a startup that sells 200 artisan candles. Ask first.)

Ask:

How many SKUs? A startup might have 10,000. Amazon has hundreds of millions.
Peak reserve QPS? 1,000 or 100,000?
Strict no-oversell, or can you tolerate brief over-reservation with cleanup?
How long can a user hold inventory before it auto-releases? 15 minutes? 24 hours?
Single region or global?

Assume you land here: 10,000 SKUs, 100K peak QPS, strict no-oversell, 15-minute holds, single region.

That last constraint, strict no-oversell, drives every major architectural decision.

The Architecture Has Two Flows

Draw two paths: the reserve flow and the release flow.

Reserve and release architecture flows showing User, API Gateway, Reservation Service, Redis fast path, Postgres source of truth, and Message Queue with delayed expiry Reserve path left-to-right; release path right-to-left after the 15-minute delay. Both paths write to the same Postgres instance.

Reserve flow:

User calls POST /reservations.
API Gateway rate-limits by user and SKU.
Reservation Service checks Redis for available inventory (fast path, microseconds).
If Redis says stock exists, attempt an atomic write to Postgres.
Postgres performs the decrement with a version check.
On success, publish a "release in 15 minutes" event to the message queue.

Release flow:

Expiry worker consumes the event after the delay.
If the reservation is still PENDING, it marks it EXPIRED and returns inventory.
Postgres increments qty_avail in a single transaction.

The Redis layer is a performance optimization. Postgres is the single source of truth. If they diverge, Postgres wins and Redis gets corrected on the next write. Redis accepts this.

The Data Model Is the Contract

Three tables. Most of the interview logic lives in how they relate.

-- Tracks available and reserved quantities per SKU
CREATE TABLE inventory (
    sku           VARCHAR(50) PRIMARY KEY,
    qty_total     INT NOT NULL,
    qty_avail     INT NOT NULL CHECK (qty_avail >= 0),
    qty_reserved  INT NOT NULL,
    version       INT NOT NULL DEFAULT 1
);
-- Invariant: qty_total = qty_avail + qty_reserved

-- One row per hold a user places on inventory
CREATE TABLE reservations (
    reservation_id  UUID PRIMARY KEY,
    user_id         BIGINT NOT NULL,
    sku             VARCHAR(50) REFERENCES inventory(sku),
    quantity        INT NOT NULL,
    status          VARCHAR(20) DEFAULT 'PENDING',  -- PENDING, CONFIRMED, EXPIRED, RELEASED
    expires_at      TIMESTAMP NOT NULL,
    created_at      TIMESTAMP DEFAULT NOW(),
    INDEX (expires_at),    -- expiry cleanup queries
    INDEX (user_id, sku)   -- lookup by user
);

-- Created when a reservation converts to a paid order
CREATE TABLE orders (
    order_id        UUID PRIMARY KEY,
    reservation_id  UUID REFERENCES reservations(reservation_id),
    created_at      TIMESTAMP DEFAULT NOW()
);

The invariant that must never break: qty_total = qty_avail + qty_reserved.

The CHECK (qty_avail >= 0) is your last-resort safety net at the database layer. You enforce it in the application first, but the constraint catches any bug that slips through.

The index on expires_at is non-obvious but critical. Every cleanup query is a range scan against that column.

Entity-relationship diagram showing inventory, reservations, and orders tables with foreign key relationships and the qty invariant annotated Three tables. The invariant on inventory is the load-bearing constraint: break it and your counts are wrong forever.

The APIs Are Simple. The Guarantees Aren't.

Five endpoints. Don't over-design this section.

POST   /reservations              -- create a 15-minute hold
POST   /reservations/:id/confirm  -- convert to order (payment required)
DELETE /reservations/:id          -- release early
GET    /reservations/:id          -- check status
GET    /inventory/:sku            -- check availability

Every mutating request must accept an idempotency key. A user whose POST /reservations times out doesn't know if it succeeded. Without idempotency, their retry creates a double-reservation. With it, you return the cached response from the first attempt.

Store idempotency results (keyed by the client-supplied key) for 24 hours. On retry, skip all business logic and return the cached status and body. See Idempotency in System Design Interviews for the full implementation.

Reserve returns a reservation_id and expires_at. Confirm returns an order_id. Release returns 204. The response shapes are straightforward; spend your interview time on what's hard.

How You Actually Prevent Oversell

This is where most candidates get vague. Don't.

Two users, ten units left, both requesting eight. The race condition without proper concurrency control:

Thread A: SELECT qty_avail WHERE sku='X'   --> 10
Thread B: SELECT qty_avail WHERE sku='X'   --> 10  (stale read)
Thread A: UPDATE qty_avail = 10 - 8 = 2   --> success
Thread B: UPDATE qty_avail = 10 - 8 = 2   --> success (WRONG)
Final qty_avail: 2  (should be -6, or Thread B should be rejected)

Thread B is having a great day. Thread B doesn't know yet.

Optimistic locking with a version column closes this gap without blocking reads.

-- Both threads read: qty_avail = 10, version = 5

-- Thread A writes:
UPDATE inventory
SET qty_avail = qty_avail - 8, version = 6
WHERE sku = 'X'
  AND version = 5       -- version check
  AND qty_avail >= 8;   -- stock check
-- 1 row updated. Success.

-- Thread B writes (still thinks version = 5):
UPDATE inventory
SET qty_avail = qty_avail - 8, version = 6
WHERE sku = 'X'
  AND version = 5
  AND qty_avail >= 8;
-- 0 rows updated. Thread B detects conflict, retries.
-- On retry: reads version = 6, qty_avail = 2. Sees 2 < 8. Rejects.

Check the affected row count after every UPDATE. Zero rows means a conflict or insufficient stock. Retry if it was a version conflict; reject if stock was insufficient.

Optimistic locking works well at normal load. Under heavy contention, like a flash sale, retries spike and latency climbs. That's when you need a different strategy. See Optimistic vs Pessimistic Locking for the full tradeoff comparison.

Hot Partitions: When One SKU Breaks Everything

Limited sneaker drop. 1,000 pairs. 5 million users, most of them bots, all of them convinced they deserve the shoes more than anyone else. Every request hits the same Postgres row.

A single-row inventory counter becomes a hot partition that serializes all writes through one bottleneck.

Hot partition diagram showing 5M concurrent requests converging on a single Postgres row, creating a growing lock queue with timeouts Every reserve request for shoe-X lands on the same row. Postgres serializes writes. Lock queue grows. Latency spikes. Everyone gets a timeout instead of a confirmation.

Two approaches worth discussing in the interview:

Bucket sharding in Redis. Instead of one key per SKU, create 100 keys: inventory:shoe-X:bucket:{hash(user_id) % 100}. Each bucket holds 10 units. User 7,432 always hits bucket 32. Load spreads across 100 independent Redis keys with no cross-key contention. Aggregate the buckets when you need a total count.

Controlled-throughput queue. Drop inbound reserve requests into a queue (SQS, Kafka). A fixed-size worker pool processes them at a rate the Postgres row can handle. Excess requests get a 429 immediately instead of timing out after 30 seconds. Pair this with exponential backoff on the client.

Both options add operational complexity. Pick one and explain it. The interviewer wants to hear you think past "add more database replicas."

Replicas help with reads. Hot partition writes still need to land on a single authoritative row unless you shard. See Hot Partitions in System Design Interviews for more strategies.

Don't Leave Inventory Stranded

A user adds to cart, opens a new tab, gets distracted, ends up watching a video about raccoons, and never comes back. Their hold expires in 15 minutes. You need to return that inventory to the pool automatically.

The most reliable approach combines a delayed message queue with a lazy cleanup fallback.

On reservation creation, publish a delayed event:

PUBLISH expiry_queue {
    reservation_id, sku, quantity,
    release_at: now + 900 seconds
}

The expiry worker consumes this message after the delay. It checks whether the reservation is still PENDING. If yes, it atomically marks it EXPIRED and increments qty_avail and decrements qty_reserved in a single transaction.

The fallback covers dropped messages: a cron job runs every five minutes and scans for rows where expires_at < NOW() AND status = 'PENDING'. The index on expires_at makes this cheap even at millions of rows.

Expiry timeline showing reservation creation at t=0, the delayed message queue firing at t=15 to mark EXPIRED and restore inventory, with cron scans every 5 minutes as a fallback for dropped messages Two safety nets: the message queue fires once at t=15, the cron runs every 5 minutes indefinitely. The cron catches anything the queue drops.

How to Pace the 45 Minutes

0:00 - 5:00    Requirements (strict no-oversell? 15-min holds? peak QPS? regions?)
5:00 - 15:00   High-level architecture (two flows, Redis + Postgres, expiry queue)
15:00 - 25:00  Data model (three tables, version column, invariant, indexes)
25:00 - 35:00  APIs and idempotency
35:00 - 45:00  Deep dive on one hard area (pick based on what the interviewer probes)

Name your decisions out loud. Don't just draw boxes. Say: "I'm using optimistic locking here because at normal load the conflict rate is low and retries are cheap. For hot SKUs during a flash sale, I'd switch to sharded Redis counters."

The interviewer will probe the area where your diagram is vaguest. Narrating out loud is how you find your own gaps before they do.

Common deep-dive directions based on interviewer probes:

"How do you prevent oversell?" Go to the locking section. Walk through the race condition explicitly.
"How does this scale to 1M QPS?" Go to hot partitions. Introduce bucket sharding.
"What happens across services?" Explain the saga: reserve inventory, charge payment, release as a compensating transaction if payment fails.
"What if the expiry worker crashes?" The cron fallback catches it. Expiry events are also retry-safe because the worker checks current status before acting.

Practicing this kind of timed, spoken walkthrough is where most prep falls short. SpaceComplexity runs voice-based system design mock interviews in the actual 45-minute format, with rubric-based feedback on your tradeoff communication, not just whether you drew the right boxes.

What the Inventory Reservation System Design Interview Tests

Inventory reservation is a concurrency problem. Oversell happens when two threads both read the same stale stock count.
Optimistic locking with a version column prevents oversell without blocking reads. Check affected row count after every update.
Redis acts as a fast check layer. Postgres is the authoritative source of truth.
Idempotency keys on reserve requests prevent double-reservation from client retries.
Hot SKUs during flash sales need bucket sharding or a controlled-throughput queue to avoid a single-row bottleneck.
Release inventory with a delayed message queue and a cron fallback. Index on expires_at.
In the interview: name your decisions, explain your tradeoffs, and pace the sections.