Push vs Pull Architecture: The System Design Interview Guide

You have two services that need to share data. One of them has to go first. The question is: which one knocks on the door, and which one sits there waiting for the knock?

That's push vs pull. And the answer cascades through your entire design. Pick the wrong one and your consumers drown in a flood of events they can't process, or you burn CPU polling an empty queue every 100ms like a golden retriever waiting at the door.

Push vs Pull: Who Knocks First

The entire difference is control over timing.

In a push model, the producer decides when data moves. It generates an event and fires it to consumers immediately. The consumer is passive, which is a polite way of saying it has no say in any of this. Better be ready.

In a pull model, the consumer decides when data moves. It asks the broker for the next batch when it's actually ready to handle something. The producer just holds data and waits, like a very patient mailbox.

Neither is strictly better. They solve different problems, and the right choice depends on latency requirements, how many consumers exist, and who should control the processing pace.

Push: The Producer Calls the Shots

When a producer pushes, it fires data immediately on each new event. No waiting. No polling interval. The consumer gets data as fast as the producer generates it, which is great until the consumer starts drowning.

Producer ──── event ──────────────────► Consumer A
          └── event (same) ───────────► Consumer B
          └── event (same) ───────────► Consumer C

Webhooks are the canonical example. Stripe detects a payment and immediately POSTs to your endpoint. You don't poll Stripe every second asking "did anything happen?" That would be deeply annoying and also expensive. Stripe tells you. Firebase Realtime Database, APNs/FCM push notifications, Server-Sent Events, Redis Pub/Sub: all push.

Latency is minimal and fan-out is natural: one event reaches many consumers in parallel. But push has no built-in backpressure. If the producer generates events faster than the consumer can process them, the consumer gets buried. The producer doesn't slow down. It just keeps pushing. The consumer either drops events, queues them until it runs out of memory, or falls over spectacularly.

Push also creates coupling. The producer has to know where to send data. Add a new consumer and you have to register it with the producer. Fine at small scale. A maintenance nightmare as the system grows.

Pull: The Consumer Controls the Pace

When a consumer pulls, it fetches a batch of data when it's ready to process more. The broker holds everything until requested, no matter how long that takes.

Consumer ──── request(offset) ─────────► Broker / Producer
         ◄─── batch of records ──────────
         [process...]
         ──── request(offset + n) ──────► Broker / Producer
         ◄─── next batch ─────────────────

Kafka is the most important example in system design interviews. Consumers poll the broker for the next batch from their current offset. When they're done processing, they commit the offset and ask for more. The consumer sets the pace, which is exactly what makes Kafka resilient under load.

Other examples: RSS feeds, HTTP polling, Prometheus scraping metrics from application endpoints, any REST API where the client decides when to query.

Backpressure is automatic. A slow consumer just doesn't request the next batch. The broker holds the data, unbothered. Consumers can go offline entirely and resume without loss, as long as the broker retains data long enough. Pull decouples producers from consumers: the producer doesn't know or care who's consuming, which is a liberating feeling for everyone involved.

The weakness is latency. There's always a delay between when data is produced and when the consumer asks for it. Poll every 100ms and you have up to 100ms of lag. Poll more frequently and you burn CPU on empty responses like a person refreshing their inbox hoping for good news. Poll less frequently and freshness suffers. You'll need to address this tension explicitly in any interview where staleness matters.

The consumer also owns retry logic, offset tracking, and deduplication. In push, the producer handles retries. In pull, the consumer does. More complexity, more control.

The Tradeoffs in One Table

Dimension	Push	Pull
Latency	Low (event-driven)	Bounded by polling interval
Backpressure	Must build explicitly	Natural, built-in
Consumer availability	Must be up when event fires	Can be offline, resume later
Producer coupling	High (knows consumer endpoints)	Low (decoupled via broker)
Fan-out	Simple	Requires each consumer to poll
Empty-poll waste	None	Real cost at low event rates
Retry on failure	Producer's problem	Consumer's problem
State tracking	Producer tracks delivery	Consumer tracks offset

Kafka Is Neither (and That's the Point)

The most common misclassification in interviews is calling Kafka a push system because "producers push messages in." Producers do push to the broker. But consumers pull from it. Kafka splits the model at the broker boundary, and that split is the whole design insight.

Producer ─── push ───► Broker (retains log) ◄─── pull ─── Consumer A
                                             ◄─── pull ─── Consumer B
                                             ◄─── pull ─── Consumer C

This is why Kafka handles wildly different consumer speeds gracefully. A slow consumer doesn't block a fast one. A crashed consumer doesn't cause the broker to drop messages. Each consumer has its own offset and its own pace. They could be running on completely different hardware, deployed in completely different regions, processing at completely different rates, and Kafka genuinely does not care.

When you say "Kafka" in an interview, the follow-up is always: "and why does the consumer pull rather than have the broker push?" The answer: consumer-controlled pace is backpressure by default. A broker pushing to consumers would need to track each consumer's throughput, implement per-consumer rate limiting, and buffer data per-consumer when it falls behind. Pulling makes all of that unnecessary. The consumer just asks when it's ready, like a civilized adult.

Push vs pull at the broker boundary: producers push in, consumers pull out, each at their own pace

Three Questions That Drive the Decision

When you hit a data-flow decision in an interview, run through these in order.

1. Who should control the processing pace?

If the consumer is the bottleneck and you care about not overwhelming it, use pull. If the producer is generating rare, time-sensitive events and you need to react immediately, use push.

2. How many consumers does one event need to reach?

Push fan-out is operationally simple: emit once, deliver to N consumers. But it requires the producer to maintain a consumer registry and handle N delivery failures. Pull with a shared broker scales fan-out more cleanly: consumers opt in by subscribing, and the broker handles retention.

3. Can the consumer be temporarily offline?

If yes, you need a broker that retains messages until the consumer reconnects. That's a pull system. If the consumer must always be available (like a mobile device receiving a notification), you need push backed by a durable buffer at the edge. APNs and FCM serve exactly this role: they buffer pushes when the device is offline and deliver when it reconnects.

One thing to flag explicitly: if you pick push and your consumers can fall behind, backpressure won't appear on its own. You need to build it in. Usually that means rate limiting at the producer, a queue in front of the consumer, or a circuit breaker that signals the producer to slow down when consumer queue depth exceeds a threshold. Missing this in an interview is a common way to lose points on otherwise solid designs. The interviewer has seen dozens of candidates draw a beautiful push architecture and then stare blankly when asked "what happens if the consumer is slow?"

Where This Pattern Shows Up

News feed fan-out. The classic Twitter design question is push vs pull for fan-out. Push on write pre-populates each follower's feed at tweet time. Pull on read aggregates feeds at request time. Push gives low read latency but is catastrophically expensive for celebrities: one tweet from a 50M-follower account means 50M writes. The servers will have opinions about this. The real answer is a hybrid: push for normal accounts, pull for high-follower accounts. Knowing that the celebrity problem breaks pure push, and naming the hybrid, is what interviewers want to hear.

Monitoring systems. Prometheus uses pull. It scrapes /metrics endpoints on a configurable interval, which lets Prometheus control its own scrape load and keeps targets simple to implement. Datadog's agent uses push: it collects metrics locally and ships them to Datadog's intake. Push works here because the agent is always running and the aggregation endpoint is purpose-built for high write throughput.

CDN cache invalidation. When content changes, the origin can push invalidation events to edge nodes (active invalidation), or edge nodes can let objects expire and pull fresh content on the next miss (TTL-based pull-through). Active invalidation gives lower latency but requires the origin to know about every edge node. Pull-through is simpler to operate but serves stale content during the TTL window. Nobody wants to explain to a product manager why their urgent homepage update is still showing the old version in Singapore.

Notification systems. Push is mandatory. You can't poll APNs or FCM asking "are there any notifications for device X?" The device registers a token, the server pushes to APNs/FCM, and they forward to the device. The design question is how you buffer when the device is offline, not whether to use push.

If you want to practice explaining this reasoning out loud under time pressure, SpaceComplexity runs voice-based system design mocks where you talk through exactly these decisions and get rubric-based feedback on your reasoning. The difference between a system design answer that lands and one that doesn't is almost always in the explanation, not the diagram.