Message Queue vs Pub/Sub: The System Design Interview Guide

You're designing a notification system. An order comes in. The email service needs to send a confirmation. The analytics pipeline needs to record the event. The fraud detection service needs to check it. Your first instinct: three synchronous API calls from the order service. Chain them. Simple.

Then the fraud service goes down. Now orders aren't completing. Then the analytics pipeline has a slow deploy. Now orders are timing out. Then someone adds a loyalty points service and you get to touch the order service again. Congratulations. You've built a distributed monolith with extra failure modes.

This is the moment the message queue vs pub/sub decision actually matters. Candidates who can name the right pattern, explain the tradeoff, and defend it tend to pass. Candidates who conflate the two, or name-drop Kafka without explaining why, tend to give interviewers something to write about. Not in a good way.

Queue vs Pub/Sub: The One-Line Distinction

A queue distributes work. Pub/sub broadcasts events.

That's the complete conceptual difference. Everything else is a consequence of it.

In a message queue (the point-to-point pattern), each message is consumed by exactly one worker. Ten jobs, three workers: each job goes to one worker and disappears. No one else sees it. The broker's whole job is making sure someone handles it once.

In pub/sub (publish-subscribe), each message is delivered to every subscriber. Three services care about an "order placed" event? All three get a copy, independently. The publisher has no idea who's listening and doesn't care. It posts the event and walks away.

The question to ask yourself: does each message represent a unit of work one worker should do, or an event that multiple services need to react to?

The Queue: One Worker Per Job

                    ┌───────────────────────────┐
                    │          Queue            │
Producer ──────────►│  [job1] [job2] [job3] ... │
                    └───────────────────────────┘
                           │       │       │
                        Worker A  Worker B  Worker C
                        (job1)    (job2)    (job3)

A producer sends messages to a broker. The broker holds them durably. Workers pull, process one message, and acknowledge it. After acknowledgment, the broker deletes the message. If a worker crashes mid-job before acknowledging, the broker redelivers to another worker. The job doesn't disappear just because one worker had a bad day.

This is at-least-once delivery: no message is lost, but you may process the same message twice if a worker crashes after processing but before acknowledging. Consumers must be idempotent. Welcome to distributed systems, where correctness is a "well, usually" proposition.

Use this pattern when:

You have CPU-heavy work to distribute across many workers (video encoding, report generation, email sending)
You want to buffer a traffic spike so downstream services aren't overwhelmed
Exactly one service should handle each job, not multiple

Dead letter queues (DLQs) are part of the pattern. After a message fails processing N times, the broker routes it to a DLQ automatically. This keeps poison-pill messages from blocking the main queue indefinitely and gives you a graveyard to inspect failures later. Mention DLQs in your interview answer. Candidates who skip error handling for unprocessable messages get downgraded on thoroughness, and they deserve it.

Pub/Sub: Every Subscriber Gets a Copy

                    ┌──────────────────────────┐
                    │   Topic: order.placed    │
Publisher ─────────►│                          │
                    └──────────────────────────┘
                       │            │            │
               Email Service   Analytics   Fraud Detection
               (own copy)      (own copy)  (own copy)

The publisher sends one message to a topic. The broker fans it out to every subscriber. Each subscriber has its own copy and processes it at its own pace, independently of the others. If analytics is slow, that's analytics's problem. Email doesn't wait.

The key property is loose coupling: the publisher has no idea how many subscribers exist, and you can add new ones without touching it. Loyalty points service wants to react to order events? Add the subscriber. Don't touch the order service. Don't wake up anyone at 2am. Just subscribe.

Use pub/sub when:

Multiple services need to react to the same event
You want to add consumers without modifying the producer
You're building event-driven microservices
One event should trigger many independent downstream workflows

The Systems You'll Actually Name

Don't just say "a message queue." Name a specific system and explain why. "I'd use a message queue" tells an interviewer you've read a blog post. "I'd use Kafka because we need replay capability and the analytics pipeline will need to backfill" tells them you've thought about it.

System	Pattern	Persistence	Ordering	Best For
Apache Kafka	Both (via consumer groups)	Yes, configurable retention	Per partition	High-throughput streams, event sourcing, replay
RabbitMQ	Queue (fanout exchange for pub/sub)	Yes	Per queue, single consumer	Complex routing, lower latency
AWS SQS	Queue	Yes, up to 14 days	FIFO variant only	Serverless AWS stack
AWS SNS	Pub/sub	No	None	Fan-out to SQS queues or HTTP endpoints
Google Cloud Pub/Sub	Pub/sub	Yes	Per region (since 2024)	GCP-native event streaming
Redis Pub/Sub	Pub/sub	No	None	In-memory, fire-and-forget
Redis Streams	Queue with replay	Yes	Per stream	Durable queuing without Kafka overhead

Kafka handles both patterns in one system. A consumer group reading a topic works like a queue: Kafka assigns each partition to one consumer in the group, so each message is processed once across the group. Multiple independent consumer groups reading the same topic works like pub/sub: each group gets all events at its own offset. Kafka also retains messages on disk regardless of consumption, so late subscribers can catch up and you can replay historical data. It is extremely good at being Kafka.

The SNS plus SQS combination is the canonical AWS fan-out pattern. SNS fans an event to multiple SQS queues. Each SQS queue has its own consumer workers. Pub/sub fan-out at the top, queue-based work distribution at the bottom. Clean, boring, and effective.

Three Tradeoffs That Decide the Architecture

Ordering: You Probably Don't Actually Need It

Most systems guarantee ordering only within a single queue or partition. Kafka guarantees ordering per partition. Messages with the same key route to the same partition, so they're processed in order relative to each other. Globally across partitions? No guarantee. The universe does not promise you global order and neither does Kafka.

SQS standard queues have no ordering guarantee at all. SQS FIFO queues guarantee strict order but cap throughput at 300 transactions per second per message group. That cap matters at scale.

If you need strict global ordering across millions of messages, you have a hard problem. The real answer is: you probably don't need it. Design consumers to be order-tolerant, or scope ordering to a key (user ID, order ID) rather than requiring it globally. Saying "we probably don't need global ordering here, and here's why" out loud in an interview is a strong signal. It shows you understand the constraint is expensive, not just that global ordering is hard.

Delivery Semantics: Default to At-Least-Once

Three modes exist:

At-most-once: messages may be lost, never duplicated. Simple, and occasionally acceptable for metrics or logging where a dropped event won't ruin your day.
At-least-once: messages are never lost, may be delivered more than once. The practical default for production.
Exactly-once: Kafka supports this within a Kafka-to-Kafka pipeline via idempotent producers and transactional APIs. End-to-end exactly-once in the real world requires idempotency keys checked against a deduplication store on the consumer side.

In interviews, say "at-least-once with idempotent consumers" unless you have a specific reason for exactly-once. Claiming exactly-once without explaining consumer-side idempotency invites follow-up questions you probably don't want. "Exactly-once delivery" is one of those phrases that sounds great until someone asks how.

Backpressure and Replay

Backpressure is what happens when producers are faster than consumers. In a pull-based system like Kafka or SQS, consumers fetch at their own pace. A slow consumer just falls behind in the log. Messages accumulate, but the system doesn't collapse. Add more consumers or partitions to recover.

In a push-based system like RabbitMQ or SNS, the broker sends messages as they arrive. A slow consumer can be overwhelmed. RabbitMQ's prefetch count limits how many unacknowledged messages a consumer holds at once, which naturally throttles the push rate. It's not magic, it's just the broker refusing to bury you faster than you can dig.

Replay is Kafka's distinguishing capability. Because messages are retained on disk rather than deleted on consumption, you can reset a consumer group's offset and reprocess historical events. Fix a bug in your consumer logic. Onboard a new service onto an existing stream. Reprocess a bad backfill. Traditional queues delete messages after consumption. No rewind, no second chances.

How to Bring It Up in an Interview

Most system design problems involving async processing, notifications, or multiple services reacting to events are opportunities to introduce one of these patterns. The candidate who spots it and names it wins. The candidate who either ignores it or names it incorrectly loses.

Start by identifying the need: "The order service needs to notify the email service, analytics pipeline, and fraud detection service asynchronously. Rather than having the order service call each one directly, I'd fan the event through a pub/sub system. This decouples the order service from its consumers and lets us add new consumers without modifying the producer."

Then defend your system choice: "I'd use Kafka here because we need message durability and the analytics pipeline will likely need to replay events during backfills. Kafka's consumer group model also lets us scale fraud detection workers independently from email workers."

Then name the tradeoffs: "We get at-least-once delivery, so consumers need to be idempotent. Ordering is per partition, which is fine since we're partitioning by user ID. We'd set retention to seven days so any consumer that falls behind has a window to catch up. Failed messages beyond retry thresholds go to a DLQ."

Four things interviewers want to hear: why async, queue or pub/sub, which system and why, and what the tradeoffs are. Hitting all four and moving on is a strong signal. Saying "I'd add Kafka" and then staring expectantly leaves the interviewer nothing to write down. You need them to write things down.

If you're practicing system design answers out loud, try SpaceComplexity. Voice-based mock interviews force you to verbalize architecture decisions the way an actual system design interview does, and the rubric-based feedback shows you exactly where your reasoning goes quiet.

Key Takeaways

Queue = work distribution (one consumer per message). Pub/sub = event fan-out (all subscribers get a copy). The rest follows from this.
Kafka handles both patterns: consumer groups act as queues, multiple groups act as pub/sub.
At-least-once is the practical default. Design consumers to be idempotent.
Kafka retains messages on disk. Traditional queues delete after consumption. Replay requires retention.
Ordering is per partition or per queue. Global ordering is an expensive design constraint you probably don't need.
Dead letter queues handle irrecoverable failures. Always mention them.
Pull-based systems (Kafka, SQS) handle backpressure naturally. Push-based systems (RabbitMQ, SNS) need explicit rate limiting.

For more on how these patterns appear inside specific system design problems: