Design Amazon's E-Commerce Platform: The System Design Interview Walkthrough

You click "Buy Now." Amazon charges your card, decrements the inventory, assigns a warehouse, prints a label, and sends you a tracking number. Usually in under a second. Across 600 million product listings, from 9.7 million sellers, for 12 million orders every day.

That's the problem. Now design it. In 45 minutes. On a whiteboard. While someone watches and says nothing.

This amazon system design interview guide covers requirements, the core data model, the five hardest services, how the system scales, and how to pace yourself in 45 minutes.

Scope It Before You Touch the Whiteboard

Amazon does a lot. You have 45 minutes. What separates good candidates is resisting the urge to design everything. The recommendation engine. The returns workflow. The seller analytics dashboard. All interesting. All out of scope.

Start by asking what the interviewer cares about. A reasonable scope for this problem is:

Users can browse and search products
Users can add items to a cart and place an order
The system reserves inventory to prevent overselling
Orders are paid and fulfilled
Out of scope: recommendations, seller-facing tools, returns, advertising

Write that down. Then get to scale.

Scale That Shapes the Design

These numbers drive every decision you make:

Metric	Estimate
Product listings	600 million
Active sellers	9.7 million
Orders per day	12 million (~140/sec average)
Peak (Prime Day)	~66,000 orders/sec
DynamoDB requests at peak	126 million/sec
Daily revenue	~$1.75 billion

The average throughput is manageable. Prime Day is not. Those DynamoDB numbers should make you nervous. Design for the peak, and you design the right system. That forces decisions around inventory consistency, cache depth, and queue-based checkout that wouldn't appear in a vanilla CRUD walkthrough.

Read:write ratio for browsing vs. ordering is roughly 1000:1. Most traffic never results in a purchase.

The Architecture Splits Into Two Worlds

Draw a line down the middle of your diagram. Literally. The first mark you make. On the left: discovery (browsing, search, recommendations). On the right: transactions (cart, checkout, orders, payments, inventory).

Discovery is read-heavy and tolerates staleness. Transactions require correctness.

A user seeing slightly stale search results costs you nothing. A user who checks out two units of a sold-out item costs you a customer and a support ticket. The entire architecture flows from that distinction. Get it backwards and you'll end up with strong consistency on product titles and eventual consistency on payments.

[Client]
    |
[API Gateway / Load Balancer]
    |               |
[Discovery Path]  [Transaction Path]
 - Product Service  - Cart Service
 - Search Service   - Inventory Service
 - CDN / Redis      - Order Service
                    - Payment Service

Two-world architecture split: discovery (AP) on the left with Elasticsearch, Redis, CDN; transactions (CP) on the right with MySQL, Postgres, Kafka Discovery and transaction paths diverge at the load balancer and never share a database.

Data Model: Start Normalized, Split Where You Must

Start with the product model. A product has multiple variants (color, size). Each variant has a SKU, and inventory tracks stock per SKU per warehouse.

-- Product catalog
products        (product_id, title, description, category_id, seller_id)
product_variants(variant_id, product_id, sku, price, attributes JSONB)

-- Inventory (separate service, separate DB)
inventory       (sku, warehouse_id, quantity, reserved_quantity, version)

-- Orders
orders          (order_id, user_id, status, idempotency_key, total_amount, created_at)
order_items     (item_id, order_id, sku, quantity, unit_price)

-- Cart (document store or Redis hash)
carts           (user_id | session_id -> {sku: quantity, ...})

The reserved_quantity column is the key insight for inventory. Stock has two states: available and reserved. You never decrement actual quantity until the order is confirmed. The version field enables optimistic locking to prevent double-decrement bugs.

Entity-relationship diagram: products to product_variants (1:N), product_variants to inventory via SKU, orders to order_items (1:N), order_items to inventory via SKU The SKU is the key that ties catalog and inventory together across separate service databases.

The Five Core Services

1. Product Catalog Service

The catalog is the source of truth for product metadata. It is not the source of truth for pricing or inventory. Those live in separate services.

The catalog feeds an Elasticsearch index asynchronously via a Kafka pipeline. The index is denormalized and search-optimized: a single document per variant with pre-joined category names, seller info, and image URLs. Stale search results are acceptable because the real validation happens at checkout, not at browse time.

Cache product pages aggressively in Redis with a 5-minute TTL. For a product that hasn't changed in a week, there is no reason to hit Postgres on every request.

2. Inventory Service

This is the hardest service. Get it wrong and you sell six laptops for the price of one. It is the only place in the system where you must have strong consistency.

The reservation model has two phases. When checkout begins, create a soft reservation: reserved_quantity += N with a 15-minute TTL stored in Redis alongside the DB record. At payment confirmation, convert the reservation to a permanent decrement: quantity -= N, reserved_quantity -= N. If payment fails or the TTL expires, the reservation releases automatically.

Why not decrement at add-to-cart? Because users add items to carts and abandon them constantly. If every cart addition was a hard decrement, you would perpetually under-report stock.

The DB update uses optimistic locking:

UPDATE inventory
SET reserved_quantity = reserved_quantity + 1,
    version = version + 1
WHERE sku = 'ABC-123'
  AND warehouse_id = 42
  AND (quantity - reserved_quantity) >= 1
  AND version = :expected_version;

Zero rows updated means someone else got there first. Retry or return out-of-stock.

Two-phase inventory reservation: add to cart (advisory, no DB write), checkout begins (soft reserve with 15-min TTL), then payment OK (hard decrement) or failure (automatic release) The soft reserve is the entire reason your flash sale doesn't sell the same unit to twelve people at once.

3. Cart Service

The cart is advisory. Think of it as a wishlist that might convert. About 70 percent of the time it won't. Never trust the cart's prices at checkout. Always refetch the price from the catalog service at order creation time. A seller might have changed the price in the thirty seconds since the user added it.

Store carts in Redis as hash maps keyed by user_id. Carts are ephemeral. If you lose a cart, the user is annoyed. If you lose an order, you have a financial dispute.

For logged-out users, persist a session token in a cookie and merge carts on login.

4. Order Service

Orders are an explicit state machine. Define the states and make invalid transitions impossible.

PENDING_PAYMENT
      |
      v (payment authorized)
CONFIRMED
      |
      v (warehouse picked + packed)
SHIPPED
      |
      v (delivery confirmed)
DELIVERED

PENDING_PAYMENT --x--> CANCELLED (timeout or user action)
CONFIRMED ------x--> CANCELLED (before shipped, rare)

Order state machine: PENDING_PAYMENT to CONFIRMED to SHIPPED to DELIVERED (happy path), with CANCELLED as a terminal state reachable from PENDING_PAYMENT or CONFIRMED Define the states and make invalid transitions impossible at the code level, not just in the diagram.

Active orders live in MySQL. Delivered and cancelled orders archive to Cassandra after 30 days. MySQL does not need to hold the full history of every order from 2006. Cassandra's wide-column model handles time-series reads on historical data well.

Each order carries an idempotency_key (typically a UUID the client generates before submitting). The server inserts the order using INSERT ... ON CONFLICT (idempotency_key) DO NOTHING. A duplicate submission returns the existing order record. This makes the create-order endpoint safe to retry.

5. Payment Service

The payment flow uses a Saga, not a 2PC. Distributed two-phase commit requires a coordinator and fails hard when that coordinator goes down. That's the thing you were trying to prevent. A Saga breaks the transaction into local commits with compensating actions on failure.

The sequence:

Reserve inventory (local commit in inventory DB)
Authorize payment via external gateway (local commit in payments DB)
Confirm order (local commit in orders DB)

If step 2 fails: release the inventory reservation (compensating action). If step 3 fails: void the payment authorization + release reservation (two compensating actions).

No global coordinator. No locks held across services. Just explicit rollback logic.

Scaling the Read Path

Product pages are the 1000x majority of traffic. Scale them by never touching the database.

The chain is: CDN for static assets and cacheable product pages, Redis for product metadata (5-minute TTL), Elasticsearch for search queries. Your origin database should see a fraction of your user traffic.

For search, Elasticsearch handles text queries, filters (price range, category, rating), and fuzzy matching. The index gets updated asynchronously when product data changes. A 30-second lag between catalog update and search index update is acceptable.

Images and static content go through CloudFront or equivalent CDN. A product image does not change. Cache it with a long TTL and a content-addressed URL so cache-busting is explicit.

For product availability hints on the search results page (the "Only 3 left in stock" label), serve a Redis cached count refreshed every 60 seconds. Exact accuracy here is not worth the database load.

How to Survive a Flash Sale

Imagine 10 million users competing for 10,000 units in 60 seconds. Your system will not survive this without explicit design choices. Every major retailer has a horror story here. The good ones built four layers of defense.

Layer 1: Virtual waiting room. Before users reach the buy button, queue them in a Redis sorted set with random scores. Admit 20,000 users per second. Issue signed JWT tokens. This converts a 10 million user spike into a steady 20K/sec stream.

ZADD waitingroom <random_score> <user_id>
ZPOPMIN waitingroom 20000  -- admit next batch

Redis sorted sets are built on a skip list internally, which is why range operations like ZPOPMIN run in O(log n).

Layer 2: Redis Lua atomic decrement. Never do a read-then-write for inventory under load. Use a Lua script executed atomically on Redis:

local stock = redis.call('GET', KEYS[1])
if tonumber(stock) <= 0 then
  return -1
end
redis.call('DECR', KEYS[1])
return 1

A single Redis node executes the script without interleaving. Throughput: 100,000+ ops/sec. For a hot SKU, shard the counter across 16-32 keys and sum on reads.

Layer 3: Async order pipeline. After the Redis DECR succeeds, publish an order intent to Kafka and return HTTP 202 Accepted immediately. Workers consume from Kafka and write the actual order record. The synchronous path is just two network round-trips.

Layer 4: DB unique constraint as final guard. Even with all the above, put UNIQUE(user_id, sale_id) on the orders table. It prevents a user from exploiting race conditions to buy twice. It is the safety net, not the primary defense.

Flash sale defense: four stacked layers showing Virtual Waiting Room, Redis Lua DECR, Kafka Async Pipeline, and DB UNIQUE constraint with throughput numbers at each stage Throughput drops at each layer. By the time a request hits the database, it has already earned its spot.

Which Services Need Strong Consistency?

Every service has a different answer to "availability vs. consistency."

Service	Consistency Need	Why
Inventory	Strong (CP)	Overselling is a financial error
Orders	Strong (CP)	Immutable financial record
Product Catalog	Eventual (AP)	Stale title text costs nothing
Search Index	Eventual (AP)	30s lag is fine
Cart	Soft (AP)	Revalidated at checkout
Notifications	Eventual (AP)	Delayed email is fine

Stating this table in an interview demonstrates that you understand the CAP theorem as a tool for reasoning, not a buzzword to drop before your interviewer's coffee goes cold.

How to Pace the Amazon System Design Interview

This is more content than 45 minutes can hold. Here is what to prioritize.

0-5 min: Clarify scope. Agree on the six core flows: browse, search, cart, checkout, order, payment. Write them down visibly.

5-10 min: Scale estimates. Derive peak orders/sec. State the 1000:1 read/write ratio. Pick your consistency tiers.

10-20 min: Draw the two-world architecture (discovery vs. transaction). Name each service. Sketch the data model for inventory and orders specifically. These are the hardest parts and your interviewer knows it.

20-35 min: Go deep on inventory. Walk through the reservation model, the optimistic lock query, and what happens when the payment fails. Then walk through the order state machine and the Saga compensation chain.

35-45 min: Cover scaling. Redis cache for reads, Elasticsearch for search, the flash sale queue. Bring up the tradeoff table. Name what you would defer (recommendations, returns, internationalization) and why.

If you get stuck, say what the constraint is: "I'm choosing availability over consistency here because the cost of a stale catalog page is zero." Showing your reasoning is worth more than the answer itself. Going silent is the one move that leaves nothing on the page for the interviewer to write down.

Putting It Into Practice

Reading about this is one layer of understanding. Explaining it out loud under pressure is a completely different skill. System design interviews test your ability to structure ambiguity on the fly, defend tradeoffs, and communicate while the clock is running.

SpaceComplexity gives you voice-based mock interviews with rubric feedback on exactly this: how clearly you communicate your design, how well you scope the problem, and whether your tradeoffs hold up under follow-up questions. You can run through this problem end-to-end and get a score before the real thing.

Recap

Split discovery (read-heavy, eventual consistency OK) from transactions (write-heavy, strong consistency required)
The inventory table uses reserved_quantity + optimistic locking. Never decrement at add-to-cart
The cart is advisory. Revalidate everything at checkout
Orders are a state machine backed by MySQL for active orders, Cassandra for history
Payment uses a Saga with compensating actions, not 2PC
Flash sales need a virtual waiting room, Redis Lua atomic DECR, and async order pipeline
Cache product pages in Redis and CDN. The database should not see raw browse traffic