Design a Distributed Message Queue: The System Design Interview Walkthrough

Every time you say "let's make this async," you are describing a message queue. The order service shouldn't wait for the email service. The payment processor shouldn't care whether the analytics pipeline is up. You decouple them with a queue that sits in between, absorbs the message, and delivers it on its own schedule.

Designing that queue from scratch is a standard system design question at event-driven companies. It forces you to reason about durability, ordering, delivery guarantees, and throughput all at once. This walkthrough covers the complete 45-minute path: requirements, architecture, data model, API, scaling, and the tradeoffs you need to name out loud before your interviewer asks.

The state of message queue knowledge in the wild, visualized:

Expanding brain meme: using RabbitMQ / PostgreSQL / in-memory struct / Google Sheets as a message queue When the team says "we just need something simple." From r/ProgrammerHumor.

Start Here: The Clarifying Questions That Actually Change the Design

Before touching architecture, ask these. Each one shifts the design in a fundamentally different direction. Skipping them is the fastest way to spend 35 minutes building the wrong system.

What throughput are we targeting? Millions of messages per second changes your storage and network choices entirely.
What are the ordering requirements? Total global ordering is expensive. Ordering per key is achievable. No ordering is cheapest.
What delivery semantics does the consumer need? At-most-once, at-least-once, or exactly-once? Most teams live with at-least-once and make their consumers idempotent.
How long should messages be retained? Hours, days, forever? Retention determines storage cost.
Push or pull? Consumers poll for messages or the broker pushes them? This changes the client contract entirely.
What's the latency budget? Sub-millisecond favors in-memory routing. Single-digit milliseconds is achievable with disk-backed logs.

For this walkthrough, assume: millions of messages per second, at-least-once delivery, per-key ordering, seven-day retention, pull-based consumers, and p99 latency under 15ms.

The Non-Negotiable Numbers

Interviewers notice when you work from numbers rather than vibes. Anchor the design to concrete targets before drawing any boxes.

Throughput: 1 million messages per second at peak (Kafka clusters routinely sustain this on commodity hardware)
Latency: p99 end-to-end under 15ms (batching plus sequential disk writes make this achievable)
Durability: messages survive broker crashes (requires replication with acknowledgment)
Availability: 99.99%, meaning the system serves reads and writes even when nodes fail
Retention: messages stored for up to 7 days regardless of whether consumers have read them

Four Components, One Job Each

The high-level diagram is simple. Producers write, brokers store, consumers read, and a coordinator keeps the cluster sane.

System architecture: Producers write to the Broker Cluster, Consumer Group polls from Brokers, KRaft Coordinator manages metadata and partition leadership The full system. Producers are dumb senders, consumers are dumb receivers, and the brokers do all the interesting work.

Producers are any service with a message to send. They know which topic to write to and optionally supply a partition key.

Brokers are the servers that store and serve messages. A cluster partitions the load. Each broker owns a subset of partitions as their leader and holds replicas of others as a follower.

Consumers pull messages from brokers. Consumers organized into a consumer group cooperate so each partition is processed by exactly one consumer in the group at a time.

The coordinator (historically ZooKeeper, replaced in Kafka 4.0 by the internal KRaft Raft quorum) stores cluster metadata: which broker owns which partition, the current ISR lists, and consumer group offsets. One node, a lot of responsibility.

The Append-Only Log: Why Sequential Writes Win

The storage primitive underneath all of this is the partition: an ordered, append-only log on disk. A topic is just a named collection of partitions. Each message has an immutable, monotonically increasing offset that never changes.

Partition as an append-only log showing committed offsets 0-5 (green), uncommitted offsets 6-7 above the High Watermark (amber), the Log End Offset (LEO), and consumer read boundary at HW HW is where consumers can read. LEO is where the producer just wrote. The gap between them is the danger zone: written to the leader, not yet confirmed by the ISR.

Partitioning assignment: producers supply a partition key (say, user_id). The broker computes partition = hash(key) % num_partitions. All messages with the same key land on the same partition in arrival order. Per-key ordering preserved. Exactly like a hash map's bucket assignment: uniform distribution is only as good as the hash function and key distribution.

Messages with no key are round-robined across partitions. Global ordering across all partitions requires collapsing everything to a single partition, which kills all parallelism. Not an option at scale.

Why append-only? Sequential disk writes on commodity SSDs reach 500 MB/s+. Random writes top out at 50 MB/s. The log structure means producers never pay for random I/O, and sequential consumer reads get handled by the OS prefetcher for free. It's not magic. It's just the right data structure.

Replication: What Keeps You Employed After the Outage

A single broker holding a partition is a single point of failure. The solution is replication. Each partition has one leader and N-1 followers (typical production: replication factor 3).

ISR replication showing Leader with LEO=12 and HW=10, Follower A in ISR at offset 10, Follower B lagging at offset 6 and evicted from ISR, consumer read boundary at HW=10 Follower B had one job. The ISR is the subset of replicas that are close enough to count. Only messages acknowledged by all ISR members advance the High Watermark.

The In-Sync Replica (ISR) set is the subset of replicas within replica.lag.time.max.ms of the leader. If a follower falls too far behind, the leader evicts it from the ISR. Think of the ISR as the replicas you can actually trust right now.

The High Watermark (HW) is the offset where all ISR members have acknowledged. Consumers can only read up to the HW. This prevents reading messages that might disappear if the leader crashes before followers catch up.

The acks parameter is the durability knob producers control:

acks=0: fire and forget, no confirmation
acks=1: leader confirms write, followers may not have it yet (default for throughput)
acks=all: all ISR members confirm, safest, roughly 2x latency cost

When a leader fails, the controller promotes the highest-offset ISR member. Non-ISR replicas can be promoted too (unclean leader election), but this risks data loss and is off by default. Turn it on only if availability beats correctness for your use case.

Pull, Not Push: Why Consumers Are in Control

Consumers poll the broker for messages. The broker does not push. This is deliberate.

Push puts backpressure complexity on the broker: it must track which consumers are overloaded, throttle sends, and buffer per-consumer. That's a lot of state on a shared system. Pull puts that logic on consumers, which are independently scalable, can batch reads to amortize network round-trips, and can process at whatever rate they can handle.

Consumer group with 3 consumers owning 6 partitions: Consumer 1 owns P0 and P1, Consumer 2 owns P2 and P3, Consumer 3 owns P4 and P5 Each partition is assigned to exactly one consumer in the group. Adding a 4th consumer to a 3-partition topic does nothing except give someone something to debug.

Consumer groups scale linearly with partition count, up to num_partitions consumers. Adding a 4th consumer to a 3-partition topic gains nothing. This is a hard constraint. State it in the interview before your interviewer has to correct you.

Offset commits track consumer progress. Consumers commit their current offset to the internal __consumer_offsets topic after processing. On restart or rebalance, they resume from the last committed offset. Commit before processing and you risk at-most-once delivery. Commit after and you get at-least-once. Commit after and process idempotently, using a unique message_id in a deduplication store (Redis or Postgres with TTL), and you get the practical default.

Three Delivery Guarantees, One Practical Default

At-most-once: producer sends, doesn't retry. Consumer reads before committing offset. Fast. Acceptable for metrics where occasional gaps are fine.

At-least-once: producer retries on timeout. Consumer commits after processing. Messages may be delivered more than once on crashes. Design consumers to be idempotent. This is the practical default for production systems.

Exactly-once: Kafka's transactional API issues each producer a producer_id and epoch. The broker deduplicates using (producer_id, sequence_number). Consumers reading from a transactional topic see committed batches only. The cost is real overhead and API complexity. Reserve it for financial events where duplicate processing causes actual harm, like charging someone twice.

Dead Letter Queues: The Poison Pill Safety Valve

No consumer is perfect. Processing fails on malformed messages, transient errors, and dependency outages. Without a safety valve, one bad message blocks a partition forever. That's not a hypothetical. That's a Saturday at 2am.

Route messages that fail after N retries to a dedicated dead letter topic (topic-name.dlq). Include original metadata (topic, partition, offset, exception stack trace) as message headers. Ops can inspect failed messages, fix the root cause, and replay them. The DLQ must support replay, not just storage, or it's just a fancy place for messages to die quietly.

Three Bottlenecks Worth Naming Out Loud

Horizontal scaling is straightforward. Add partitions, add brokers, add consumers in lockstep. But three real bottlenecks will bite you, and naming them before being asked is a strong signal to interviewers.

Coordinator throughput. A single active controller handles all metadata changes: partition leadership changes, consumer group rebalances, broker registration. Under high churn (many consumer restarts at once), the controller becomes a hotspot. KRaft improved this over ZooKeeper, but it is still a single-threaded state machine. Consumer rebalance storms, where many consumers restart simultaneously and trigger waves of partition reassignments, are a known production hazard.

Disk I/O on hot leaders. Producers write to the leader. Followers pull from the leader. A leader serving a hot partition handles writes plus N-1 follower fetch requests simultaneously. Add brokers, move partitions, and avoid hot keys that route all traffic to one partition. A single user_id that generates 90% of your traffic will ruin your day.

Consumer lag. If consumers process slower than producers produce, lag accumulates. Monitor consumer_group_lag per partition. Add consumers up to partition count, optimize consumer processing, or increase batch size to amortize overhead.

The Tradeoffs Your Interviewer Wants to Hear First

Name these explicitly. Don't wait to be asked.

Decision	Option A	Option B
Ordering	Per-key (scales)	Global (single partition, no scale)
Delivery	At-least-once + idempotent consumer	Exactly-once (higher cost)
Consumer model	Pull (consumer controls rate)	Push (broker controls delivery)
Durability	`acks=all` + ISR (safe, higher latency)	`acks=1` (faster, data loss risk)
Retention	Time-based deletion	Log compaction (keep latest per key)

Log compaction deserves a mention. Instead of deleting messages older than N days, the broker keeps only the latest message per key. Useful for configuration or state topics where only the current value matters. The changelog pattern in stream processing depends on it.

The push vs pull tradeoff comes up almost every time. The choice between push and pull is never obvious: pull gives consumers control but means you must poll even when idle; push minimizes latency but requires the broker to track per-consumer state.

The 45-Minute Interview Clock

Walk in with a plan. Every section maps to a time box, and your interviewer is watching the clock whether they seem like it or not.

Minutes 0-5: Requirements. Ask the clarifying questions above. Write throughput, latency, ordering, and delivery semantics on the board. Do not skip this. Requirements drive every downstream choice, and skipping them is a Communication score 2.

Minutes 5-10: High-level diagram. Draw producers, broker cluster, consumer group, coordinator. Name the components. This is the map you reference for the rest of the interview.

Minutes 10-25: Core design. Explain the partition log. Derive why sequential writes win on disk. Walk through replication: ISR, high watermark, acks parameter. Explain consumer groups and the partition-to-consumer assignment constraint. Cover offset commits and at-least-once delivery.

Minutes 25-35: Deep dives. Your interviewer will probe here. Likely targets: exactly-once semantics, dead letter queue design, handling hot partitions, consumer rebalance storms. Go deep where asked.

Minutes 35-45: Tradeoffs and follow-ups. Revisit the tradeoff table. State what you would monitor in production: consumer lag, ISR shrinks, controller election rate. If time remains, discuss log compaction.

Narrating a system design out loud is a different skill than knowing the material. SpaceComplexity runs rubric-based voice mock interviews specifically for system design rounds, so you can practice the full 45-minute walk under realistic conditions and get feedback on where you're losing signal.

Recap

Topics are collections of ordered, append-only partitions. Each partition is an immutable offset-indexed log.
Partition keys route messages deterministically using hash(key) % num_partitions, preserving per-key ordering.
ISR replication with a high watermark prevents consumers from reading uncommitted data.
The acks parameter is the durability-latency tradeoff knob.
Consumer groups assign one partition per consumer. Max useful consumers equals partition count.
At-least-once with idempotent consumers is the practical default. Exactly-once is available but costly.
Dead letter queues catch unprocessable messages without blocking partitions.
Coordinator throughput, disk I/O on hot partitions, and consumer lag are the three real bottlenecks.