Push vs Pull Architecture: The System Design Interview Guide

- Push sends data immediately on each event; low latency but no built-in backpressure if consumers fall behind
- Pull lets consumers fetch when ready; backpressure is natural and consumers can be offline, but polling adds latency
- Kafka splits the model: producers push to the broker, consumers pull from it, giving each consumer independent pace control
- Fan-out on write vs read is the Twitter celebrity problem: push pre-populates feeds but breaks at 50M followers; the real answer is a hybrid
- Use three questions to decide: who controls pace, how many consumers, can the consumer be offline
You have two services that need to share data. One of them has to go first. The question is: which one knocks on the door, and which one sits there waiting for the knock?
That's push vs pull. And the answer cascades through your entire design. Pick the wrong one and your consumers drown in a flood of events they can't process, or you burn CPU polling an empty queue every 100ms like a golden retriever waiting at the door.
Push vs Pull: Who Knocks First
The entire difference is control over timing.
In a push model, the producer decides when data moves. It generates an event and fires it to consumers immediately. The consumer is passive, which is a polite way of saying it has no say in any of this. Better be ready.
In a pull model, the consumer decides when data moves. It asks the broker for the next batch when it's actually ready to handle something. The producer just holds data and waits, like a very patient mailbox.
Neither is strictly better. They solve different problems, and the right choice depends on latency requirements, how many consumers exist, and who should control the processing pace.
Push: The Producer Calls the Shots
When a producer pushes, it fires data immediately on each new event. No waiting. No polling interval. The consumer gets data as fast as the producer generates it, which is great until the consumer starts drowning.
Producer ──── event ──────────────────► Consumer A
└── event (same) ───────────► Consumer B
└── event (same) ───────────► Consumer C
Webhooks are the canonical example. Stripe detects a payment and immediately POSTs to your endpoint. You don't poll Stripe every second asking "did anything happen?" That would be deeply annoying and also expensive. Stripe tells you. Firebase Realtime Database, APNs/FCM push notifications, Server-Sent Events, Redis Pub/Sub: all push.
Latency is minimal and fan-out is natural: one event reaches many consumers in parallel. But push has no built-in backpressure. If the producer generates events faster than the consumer can process them, the consumer gets buried. The producer doesn't slow down. It just keeps pushing. The consumer either drops events, queues them until it runs out of memory, or falls over spectacularly.
Push also creates coupling. The producer has to know where to send data. Add a new consumer and you have to register it with the producer. Fine at small scale. A maintenance nightmare as the system grows.
Pull: The Consumer Controls the Pace
When a consumer pulls, it fetches a batch of data when it's ready to process more. The broker holds everything until requested, no matter how long that takes.
Consumer ──── request(offset) ─────────► Broker / Producer
◄─── batch of records ──────────
[process...]
──── request(offset + n) ──────► Broker / Producer
◄─── next batch ─────────────────
Kafka is the most important example in system design interviews. Consumers poll the broker for the next batch from their current offset. When they're done processing, they commit the offset and ask for more. The consumer sets the pace, which is exactly what makes Kafka resilient under load.
Other examples: RSS feeds, HTTP polling, Prometheus scraping metrics from application endpoints, any REST API where the client decides when to query.
Backpressure is automatic. A slow consumer just doesn't request the next batch. The broker holds the data, unbothered. Consumers can go offline entirely and resume without loss, as long as the broker retains data long enough. Pull decouples producers from consumers: the producer doesn't know or care who's consuming, which is a liberating feeling for everyone involved.
The weakness is latency. There's always a delay between when data is produced and when the consumer asks for it. Poll every 100ms and you have up to 100ms of lag. Poll more frequently and you burn CPU on empty responses like a person refreshing their inbox hoping for good news. Poll less frequently and freshness suffers. You'll need to address this tension explicitly in any interview where staleness matters.
The consumer also owns retry logic, offset tracking, and deduplication. In push, the producer handles retries. In pull, the consumer does. More complexity, more control.
The Tradeoffs in One Table
| Dimension | Push | Pull |
|---|---|---|
| Latency | Low (event-driven) | Bounded by polling interval |
| Backpressure | Must build explicitly | Natural, built-in |
| Consumer availability | Must be up when event fires | Can be offline, resume later |
| Producer coupling | High (knows consumer endpoints) | Low (decoupled via broker) |
| Fan-out | Simple | Requires each consumer to poll |
| Empty-poll waste | None | Real cost at low event rates |
| Retry on failure | Producer's problem | Consumer's problem |
| State tracking | Producer tracks delivery | Consumer tracks offset |
Kafka Is Neither (and That's the Point)
The most common misclassification in interviews is calling Kafka a push system because "producers push messages in." Producers do push to the broker. But consumers pull from it. Kafka splits the model at the broker boundary, and that split is the whole design insight.
Producer ─── push ───► Broker (retains log) ◄─── pull ─── Consumer A
◄─── pull ─── Consumer B
◄─── pull ─── Consumer C
This is why Kafka handles wildly different consumer speeds gracefully. A slow consumer doesn't block a fast one. A crashed consumer doesn't cause the broker to drop messages. Each consumer has its own offset and its own pace. They could be running on completely different hardware, deployed in completely different regions, processing at completely different rates, and Kafka genuinely does not care.
When you say "Kafka" in an interview, the follow-up is always: "and why does the consumer pull rather than have the broker push?" The answer: consumer-controlled pace is backpressure by default. A broker pushing to consumers would need to track each consumer's throughput, implement per-consumer rate limiting, and buffer data per-consumer when it falls behind. Pulling makes all of that unnecessary. The consumer just asks when it's ready, like a civilized adult.
Three Questions That Drive the Decision
When you hit a data-flow decision in an interview, run through these in order.
1. Who should control the processing pace?
If the consumer is the bottleneck and you care about not overwhelming it, use pull. If the producer is generating rare, time-sensitive events and you need to react immediately, use push.
2. How many consumers does one event need to reach?
Push fan-out is operationally simple: emit once, deliver to N consumers. But it requires the producer to maintain a consumer registry and handle N delivery failures. Pull with a shared broker scales fan-out more cleanly: consumers opt in by subscribing, and the broker handles retention.
3. Can the consumer be temporarily offline?
If yes, you need a broker that retains messages until the consumer reconnects. That's a pull system. If the consumer must always be available (like a mobile device receiving a notification), you need push backed by a durable buffer at the edge. APNs and FCM serve exactly this role: they buffer pushes when the device is offline and deliver when it reconnects.
One thing to flag explicitly: if you pick push and your consumers can fall behind, backpressure won't appear on its own. You need to build it in. Usually that means rate limiting at the producer, a queue in front of the consumer, or a circuit breaker that signals the producer to slow down when consumer queue depth exceeds a threshold. Missing this in an interview is a common way to lose points on otherwise solid designs. The interviewer has seen dozens of candidates draw a beautiful push architecture and then stare blankly when asked "what happens if the consumer is slow?"
Where This Pattern Shows Up
News feed fan-out. The classic Twitter design question is push vs pull for fan-out. Push on write pre-populates each follower's feed at tweet time. Pull on read aggregates feeds at request time. Push gives low read latency but is catastrophically expensive for celebrities: one tweet from a 50M-follower account means 50M writes. The servers will have opinions about this. The real answer is a hybrid: push for normal accounts, pull for high-follower accounts. Knowing that the celebrity problem breaks pure push, and naming the hybrid, is what interviewers want to hear.
Monitoring systems. Prometheus uses pull. It scrapes /metrics endpoints on a configurable interval, which lets Prometheus control its own scrape load and keeps targets simple to implement. Datadog's agent uses push: it collects metrics locally and ships them to Datadog's intake. Push works here because the agent is always running and the aggregation endpoint is purpose-built for high write throughput.
CDN cache invalidation. When content changes, the origin can push invalidation events to edge nodes (active invalidation), or edge nodes can let objects expire and pull fresh content on the next miss (TTL-based pull-through). Active invalidation gives lower latency but requires the origin to know about every edge node. Pull-through is simpler to operate but serves stale content during the TTL window. Nobody wants to explain to a product manager why their urgent homepage update is still showing the old version in Singapore.
Notification systems. Push is mandatory. You can't poll APNs or FCM asking "are there any notifications for device X?" The device registers a token, the server pushes to APNs/FCM, and they forward to the device. The design question is how you buffer when the device is offline, not whether to use push.
If you want to practice explaining this reasoning out loud under time pressure, SpaceComplexity runs voice-based system design mocks where you talk through exactly these decisions and get rubric-based feedback on your reasoning. The difference between a system design answer that lands and one that doesn't is almost always in the explanation, not the diagram.
Related Reading
- Message Queue vs Pub/Sub for how brokers implement these patterns
- Backpressure and Flow Control for what happens when you pick push and the consumer can't keep up
- Twitter News Feed System Design for the fan-out on write vs read tradeoff in practice
- WebSockets vs Long Polling vs SSE for the push mechanisms available in real-time web systems
Further Reading
- Apache Kafka Documentation: The Consumer, official explanation of why Kafka uses pull and what offset management looks like
- Webhooks vs. Polling, Wikipedia overview of the webhook push pattern
- Prometheus: Why Pull?, the Prometheus team's own defense of the pull model for monitoring
- AWS SNS Fan-Out Pattern, canonical push fan-out via SNS to SQS
- Google Cloud Pub/Sub: Push vs Pull subscriptions, production-grade documentation on choosing between the two delivery modes