WebSockets vs Long Polling vs SSE: Which One Does Your System Need?

June 3, 202610 min read
system-designinterview-prepcareeralgorithms
WebSockets vs Long Polling vs SSE: Which One Does Your System Need?
TL;DR
  • Long polling works everywhere but pays 1,400+ bytes of HTTP header overhead per round trip; treat it as a fallback, not a first choice
  • Server-sent events (SSE) stream over a single persistent connection with built-in browser reconnection via Last-Event-ID; no sticky sessions required
  • WebSockets give full-duplex at 2-14 bytes per frame; the only option when the client sends high-frequency data like chat messages or game inputs
  • The one-way vs two-way question collapses the decision immediately: SSE for server-push, WebSocket for bidirectional traffic
  • Sticky sessions are the WebSocket follow-up trap; pair cookie or IP-hash affinity at the load balancer with Redis Pub/Sub for cross-server routing
  • SSE is the dominant streaming choice for AI products today; OpenAI and Anthropic both use it to stream tokens word by word

Picture this: the interviewer asks how your notification system delivers real-time updates to users. You say "WebSockets" with confidence. They ask why. You say "because it's real-time." They stare at you. You stare back. Someone blinks.

That's not a protocol choice. That's a prayer.

Long polling, SSE, and WebSocket each solve the server-push problem differently. They arrived in roughly that order, and all three are still running in production today. Knowing which one to reach for, and being able to explain why out loud, is what actually gets scored.

HTTP Doesn't Call You Back

There is no mechanism in HTTP for the server to push data unless the client asked first. The client sends a request. The server answers. Connection closes. That's it.

Fine for loading a page. Useless if your product needs to tell a user something happened without them refreshing.

The server never texts first. It waits for you to reach out. The three approaches below each exploit a different loophole in that model.

Long Polling: The Duct Tape That Holds Production Together

The idea is embarrassingly simple. The client sends a request. The server holds it open instead of responding immediately. When the server has something to say, it responds. The client gets the data, then immediately fires off another request.

Client          Server
  |--- GET /events ----------->|
  |                            | (waiting...)
  |                            | (new event arrives)
  |<-- 200 {data} -------------|
  |--- GET /events ----------->|   ← immediate retry
  |                            | (waiting again...)

It works. It's also the duct tape of real-time web development, and everyone knows it, and it's still everywhere.

The average latency is half your timeout window. With a 30-second timeout you're looking at roughly 15 seconds of average lag. Not great for a chat app. Fine for a dashboard that refreshes once a minute and no one cares if it's a few seconds slow.

The real cost of long polling isn't latency. It's header overhead. Every round trip carries 1-2 KB of HTTP headers: cookies, user-agent, referrer, content negotiation, the whole party. At 10,000 connected clients each polling every 30 seconds, that's 20,000 requests per minute and roughly 28 MB per minute in headers alone, before you've sent a single byte of actual payload.

The saving grace: long polling works everywhere. Every browser, every corporate proxy, every CDN knows exactly what to do with an HTTP GET. This is why every WebSocket library falls back to long polling silently when the upgrade fails. It's the basement apartment of real-time protocols. Not glamorous. Always available.

Bring it up to show you know the tradeoffs. Then recommend something better.

Server-Sent Events: One Connection, One Direction, Surprisingly Good

SSE uses a single persistent HTTP connection. The server streams events to the client in a dead-simple text format. The client never sends data back on that connection.

Client          Server
  |--- GET /stream ----------->|
  |<-- HTTP 200 (keep-alive) --|
  |<-- data: {event 1} --------|
  |<-- data: {event 2} --------|
  |<-- data: {event 3} --------|
  |           ...

Each event looks like this:

event: notification
id: 42
data: {"userId": 7, "type": "mention"}

The blank line terminates the event. That's the entire protocol. You could write a compliant SSE server in about 20 lines.

SSE has two properties that make it underrated: automatic reconnection and event IDs. When the connection drops, the browser reconnects automatically and sends a Last-Event-ID header. The server replays events since that ID. You get resumability built in, no reconnection logic to write yourself, no library needed.

Per-message overhead is tiny. Roughly 5 bytes of field names on a persistent connection. No headers per message, no handshake per poll cycle.

The limit is obvious: one direction. The client can't push data back on the SSE connection. If your feature needs the client to send real-time updates (typing indicators, cursor positions, game inputs), SSE alone isn't enough. You'd pair it with regular HTTP POST requests for writes.

Who uses this in production? Every LLM API. OpenAI, Anthropic, and Google all stream token-by-token responses over SSE. That thing where a chatbot types its response word by word as you watch? That's an SSE connection. Each token is a separate event on a persistent HTTP connection. The resurgence of SSE in 2025-2026 is almost entirely driven by AI product adoption. SSE went from "the weird one" to "the one everyone's using for AI" in about two years.

WebSocket: Starts as HTTP, Becomes Something Else Entirely

WebSocket starts a polite HTTP conversation and then abandons the entire protocol.

Client                           Server
  |--- GET /ws HTTP/1.1 -------->|
  |    Upgrade: websocket        |
  |    Sec-WebSocket-Key: xyz... |
  |                              |
  |<-- 101 Switching Protocols --|
  |    Sec-WebSocket-Accept: abc |
  |                              |
  |<======= full duplex ========>|
  |--- frame: "hello" ---------->|
  |<-- frame: "world" -----------|

After the 101 response, the connection stops being HTTP. It becomes a raw bidirectional pipe. Either side can send at any time. Per-frame overhead is 2-14 bytes. Latency under 50ms is typical for a well-run deployment.

WebSocket is the right answer when the client needs to send data at high frequency. Chat messages, collaborative editing operations, live game inputs, trading orders: these require the client to push to the server constantly. SSE can't do that. Long polling can simulate it, but you'd be opening a new HTTP request for every single user action.

The cost is infrastructure complexity. WebSocket connections are stateful. Your load balancer has to route every frame from a given client to the same backend server (a sticky session), which complicates horizontal scaling considerably. You also have to implement your own reconnection logic and heartbeat since the browser won't auto-reconnect on drop the way it does with SSE.

Slack runs on WebSocket. When you send a message, your client pushes a frame to the server over the existing connection. The server pushes that message to every recipient. Sub-100ms round trips at scale, covered in detail in Slack System Design.

The Numbers That Actually Decide It

Long PollingSSEWebSocket
Per-message overhead1,400+ bytes~5 bytes2-14 bytes
Average latency0 to 30s< 200ms< 50ms
DirectionSimulated bidirectionalServer to client onlyFull duplex
Auto-reconnectYou write itBrowser handles itYou write it
Proxy-friendlyYesYesSometimes
Sticky sessions neededNoNoYes
Browser support100%~95%99%+

The proxy and firewall column matters more than most candidates realize. Corporate networks often block or time out connections that don't look like regular HTTP. SSE is regular HTTP. Long polling is regular HTTP. WebSocket uses the Upgrade header, and some proxies strip it or refuse to pass it through. This is why every production WebSocket library falls back to long polling silently when the upgrade is rejected. Your fancy WebSocket deployment and your users' IT department may have different opinions about what protocols are allowed.

Long polling in a greenfield system is a last resort. SSE for server-push scenarios: notifications, feeds, live logs, AI streaming. WebSocket when the client needs to push too.

Protocol decision flow: one question collapses the space from three options to one

Start with the direction question. Everything else falls out.

How to Say This in an Interview

When a design question has a real-time component, the interviewer is waiting for you to raise the protocol choice unprompted. If you wait for them to ask, you've already missed a signal.

Don't just say "we'll use WebSockets." Walk through the decision.

Start with one clarifying question: is this communication one-way or two-way? That answer collapses the options immediately.

For a notification system or live activity feed, the answer is one-way. Say: "Since we only need server push, I'd use SSE over WebSocket. Simpler to scale, no sticky sessions, and the browser handles reconnection automatically with the Last-Event-ID mechanism. Client-side writes like marking a notification read go over normal HTTP POST." The Notification System Design walkthrough covers how the fan-out layer upstream connects to this SSE delivery tier.

For a chat app or collaborative editor, the answer is two-way. Say: "We need full-duplex, so WebSocket. The trade is sticky sessions at the load balancer. I'd put Redis Pub/Sub behind the WebSocket servers so any server can route messages across the cluster without clients needing to share a backend." For the full treatment of what sits on top of the WebSocket layer in a collaborative tool, see Collaborative Editor System Design.

Practice saying this out loud at SpaceComplexity, which runs voice-based system design interviews with rubric feedback. Knowing the tradeoffs is one thing. Articulating them clearly under pressure, while someone watches, is the part that actually gets scored.

The Sticky Sessions Trap

Most candidates drop the protocol choice and move on. The follow-up lands like a trapdoor.

When you propose WebSocket, a good interviewer will ask: "how does your load balancer handle this?" The wrong answer is "it routes requests normally."

WebSocket connections are long-lived TCP connections. The load balancer can't treat each frame as a separate HTTP request to be distributed across backends. Once a client connects to a backend server, every subsequent frame from that client has to reach that same backend. If it doesn't, the server has no idea what the client is talking about.

Use cookie-based or IP-hash affinity at the load balancer, and offload cross-server messaging to a message broker. With Redis Pub/Sub, when a WebSocket server receives a message, it publishes to a channel. Every other subscribed server forwards it to its own connected clients. Connections stay sticky, but message routing is decentralized. You scale WebSocket servers horizontally without the load balancer needing to route messages across backends.

This is the standard architecture behind every large-scale WebSocket deployment. Name it in an interview and you've shown you thought past the protocol choice to the actual systems problem. That's what separates a protocol name-dropper from a systems thinker.

Start With One Question

  • Long polling works everywhere, but every round trip pays 1,400+ bytes in HTTP headers. Average latency at a 30-second timeout is 15 seconds. Use it as a fallback, not a first choice.
  • SSE is one persistent connection, ~5 bytes per message, and the browser auto-reconnects via Last-Event-ID. No sticky sessions. The dominant choice for server-push today, especially AI streaming.
  • WebSocket is the only option for high-frequency bidirectional traffic. Full duplex, sub-50ms latency. Requires sticky sessions and your own reconnection logic.
  • One-way or two-way. Ask that first. The rest follows.

Further Reading