Live Comments System Design Interview: Fan-Out Is the Hard Part

May 27, 202612 min read
interview-prepcareerdsaalgorithms
Live Comments System Design Interview: Fan-Out Is the Hard Part
TL;DR
  • Fan-out is the core constraint: one comment must reach 500K open connections under 100ms — write volume (3K–15K writes/sec) is not the bottleneck.
  • Two-path architecture: write path ends at PostgreSQL; the fan-out path goes through Redis pub/sub to WebSocket servers in parallel, decoupling them completely.
  • Snowflake IDs embed timestamps so cursor pagination and chronological ordering share one composite index (post_id, id DESC) with no secondary sort column.
  • Redis pub/sub beats Kafka on the real-time leg: sub-millisecond delivery, no consumer groups needed; Kafka belongs in the async downstream pipeline for analytics and notifications.
  • Viral post strategy: switch from WebSocket push to short polling against a Redis sorted set cache capped at 200 entries when viewer counts spike above a threshold.
  • SSE is an underrated alternative to WebSocket for comment feeds: server-push fits the read-heavy pattern and works natively over HTTP/2 without a client library.
  • Approximate like counts via Redis INCR flushed to Postgres every 30 seconds to avoid hot-row contention on popular comments.

A live comments system looks simple. You have a table. Users append rows. Other users read them. A working prototype takes an afternoon. You show it to your team. Everyone is impressed. You go home feeling good.

Then a post goes viral. Five hundred thousand viewers have the page open simultaneously. One comment arrives. Your system now has to push that single event to 500,000 open connections in under a hundred milliseconds. The afternoon prototype does not survive contact with this requirement. The fan-out is the actual problem, and that's what the interviewer is waiting to see you address.

This is the live comments system design interview. Here's how to spend 45 minutes on it without running out of things to say.

What to Clarify Before You Draw a Single Box

The word "comments" hides a lot of variation. Before any whiteboard marks, ask:

  • Real-time or near-real-time? A blog comment section can poll every 30 seconds and nobody notices. YouTube Live chat needs sub-second delivery. The architecture diverges completely depending on the answer.
  • Threaded or flat? One level of nesting (Facebook-style) versus deep threading (Reddit-style) changes the data model significantly.
  • Read-heavy or write-heavy? Live video comments skew read-heavy: thousands of viewers per commenter. Most systems sit at roughly a thousand reads per write.
  • Moderation requirements? Synchronous content filtering blocks your critical path. Asynchronous flagging is far cheaper and almost always good enough.

For this walkthrough: real-time delivery, one level of nesting, 50M DAU, peak 500K concurrent viewers on a popular post, and async moderation.

Scale estimate: 50M users, five comments per user per day = 250M comments/day = roughly 3,000 writes/second average, 15,000 writes/second at peak. That's not write-heavy by database standards. The write load will be fine. You can feel almost disappointed by how fine it will be. The fan-out load is where this system actually gets interesting.

Two Paths, One System

Keep two separate mental models in your head from the start: the write path and the fan-out path. They use different components and have different latency budgets. Treating them as the same problem is how you end up in a confused tangle halfway through your explanation.

Two-path architecture: write path ends at PostgreSQL, fan-out path flows through Redis pub/sub to WebSocket servers pushing to 500K clients

One comment triggers two completely independent flows. Comment Service publishes once. Redis does the rest.

The write path: the client POSTs a comment. API Gateway authenticates and rate-limits. Comment Service validates content, writes to PostgreSQL (source of truth), then publishes to a Redis pub/sub channel keyed by post_id.

The fan-out path: every WebSocket server that has clients subscribed to post_id receives the message from Redis and pushes it down those local connections. Redis broadcasts to all subscribed servers simultaneously. Each server only fans out to its own clients. The work is parallel.

Comment Service doesn't know or care how many WebSocket servers exist, or how many clients are connected to each. It publishes once. This decoupling is the key architectural decision. Name it explicitly in your interview. Interviewers love when you name it.

The Schema Is Simple. The Indices Are the Insight.

Keep the table narrow. You can always add columns; you cannot un-index a hot table at scale.

CREATE TABLE comments ( id BIGINT PRIMARY KEY, -- Snowflake ID (time-ordered) post_id BIGINT NOT NULL, parent_id BIGINT REFERENCES comments(id), user_id BIGINT NOT NULL, content TEXT NOT NULL, status SMALLINT DEFAULT 0, -- 0=visible, 1=deleted, 2=flagged like_count INT DEFAULT 0, created_at TIMESTAMPTZ DEFAULT now() ); CREATE INDEX idx_comments_post ON comments(post_id, id DESC); CREATE INDEX idx_comments_replies ON comments(parent_id, id DESC) WHERE parent_id IS NOT NULL;

Use a Snowflake-style ID, not a UUID. Snowflake IDs embed a millisecond timestamp in the high bits, so ORDER BY id DESC is equivalent to ORDER BY created_at DESC with no secondary sort column. Every paginated query hits (post_id, id DESC) cleanly.

Don't use offset pagination. New comments arrive while the user is scrolling. Offset 20 shifts when comment 21 is inserted above the reader. You get duplicates. You skip entries. Users notice. Cursor pagination is the right call: pass the last-seen id as the cursor, query WHERE post_id = ? AND id < cursor ORDER BY id DESC LIMIT 20. Stable, index-friendly, and it works perfectly with a live feed.

Offset pagination breaks when new comments arrive mid-scroll; cursor pagination anchors to an ID and stays stable regardless of inserts

The offset approach is like measuring your place in a book by how many pages you skipped, then someone adds pages in front of you. Cursor pagination uses a bookmark.

How a Comment Gets from the Database to Half a Million Browsers

When a client loads a page with comments, two things happen in parallel:

  1. It fetches the initial comment list via HTTP, cursor-paginated.
  2. It opens a WebSocket connection and sends a subscribe message: {"type": "subscribe", "postId": "123"}.

The WebSocket server adds that connection to a local subscriber map keyed by post_id. It also subscribes to the Redis pub/sub channel comments:123 if it hasn't already.

When Comment Service writes a new comment and publishes to comments:123, Redis delivers the message to every WebSocket server subscribed to that channel. Each server iterates its local subscriber map for that post and sends the comment payload down every open socket.

Use Redis pub/sub on this leg, not Kafka. Redis delivers in sub-millisecond. Kafka adds 5-50ms of latency plus coordination overhead you don't need here. The comment is already durable in PostgreSQL. The pub/sub channel is ephemeral glue. If a WebSocket server misses a message because it was restarting, the client reconnects and catches up by re-fetching with its last-seen cursor. That's an acceptable recovery path.

Kafka belongs downstream: async jobs for spam scoring, notification delivery, analytics pipelines. The rule is simple. If you're configuring a consumer group for real-time comment push, you've gone somewhere wrong. Keep Kafka off the critical path from database commit to client render.

One Post, 500,000 Viewers, Ten Comments Per Second

Run the numbers honestly.

Fan-out arithmetic: 10 comments/sec times 500K viewers equals 5 million pushes per second, handled by 5 Go WebSocket servers at 1M messages/sec each

Ten comments per second, each pushed to 500K connections = 5,000,000 pushes per second. Can you handle this? Yes, because the work is embarrassingly parallel.

A Go WebSocket server with 100K active connections can push about 1,000,000 small messages per second before CPU saturates. Five WebSocket servers handle the fan-out for this one post, each receiving ten pub/sub messages per second from Redis and pushing to 100K local clients. Five servers. Done. This is one of those moments in system design where the math works out cleanly and you should let it.

The real problem is message velocity, not viewer count. During a YouTube premiere, comments can arrive at thousands per second. At that rate, pushing every individual comment is both wasted work (clients cannot read that fast) and a network firehose. Two practical solutions:

  • Batch and throttle. Aggregate comments server-side. Every 500ms, flush all buffered comments for a post to subscribers in a single payload. Clients receive a batch and render the last visible few. This collapses 1,000 per-message pushes into 2 per-second batches per server.
  • Adaptive polling fallback. For posts above a viewer-count threshold, switch from WebSocket push to short polling. Clients poll every one to two seconds. The server maintains a Redis sorted set of recent comments (ZADD comments:{postId} <id> <payload>, capped at 200 entries). Polling is embarrassingly parallel read traffic against a cache. No fan-out pressure whatsoever.

For viral posts, short polling from a Redis cache often outperforms WebSocket fan-out. The read-heavy access pattern is exactly what a cache is designed for. The latency cost of one to two seconds is invisible against a comment stream moving at 50 entries per second.

For normal posts use WebSocket push; for viral posts above 100K concurrent viewers, switch to adaptive polling against a Redis sorted set cache

Pick your weapon based on the post's viewer count. The same system runs both modes.

15K Writes per Second Is, Genuinely, a Solved Problem

At 15,000 writes per second, a single PostgreSQL primary is fine. Say that in the interview and move on quickly. Modern hardware with async commits handles 20,000-50,000 row inserts per second without breaking a sweat. Add a second primary shard keyed by post_id % 2 only when you actually need it, which at this scale you don't.

Two read replicas absorb the initial comment-load queries. Direct cursor-paginated reads at replicas; direct writes at the primary.

Cache the first page of comments per post in Redis as a sorted set (ZRANGEBYSCORE). Popular posts hit the cache on every load. Once a post is warm, replicas see mostly cold traffic and the primary sees only writes.

Incredibles meme: "the database is slow, let's just use cache for everything" with the villain addressing a crowd that is visibly skeptical

Every tech lead who has touched a viral comment section eventually arrives here.

Approximate like counts to avoid hot rows. Incrementing like_count in PostgreSQL on every like creates contention on hot comment rows. A popular comment on a viral post gets thousands of likes per second. Your whole transaction throughput dies on one counter. Better: accumulate likes in Redis (INCR likes:{commentId}), flush to Postgres asynchronously every 30 seconds. The displayed count lags by at most 30 seconds. Nobody has ever complained that a like count was thirty seconds stale.

The Comparisons Your Interviewer Will Ask For

WebSocket vs SSE vs long-polling:

WebSocketSSELong-poll
DirectionBidirectionalServer push onlyServer push only
Load balancingRequires sticky sessions or shared pub/subSameStateless
Auto-reconnectManualBuilt-inBuilt-in
HTTP/2 supportNoneMultiplexedNone
ComplexityHighMediumLow

SSE is underrated for comment feeds. Comments are a read-heavy stream: the browser reads, the user writes via separate POST requests. SSE fits that model exactly and works natively in every browser without a client library. The main gap is browser connection limits (six per origin over HTTP/1.1), but HTTP/2 removes this with multiplexing. Most teams cargo-cult WebSocket because that's what they've seen before. Mentioning SSE as a considered alternative in your interview signals that you actually thought about the access pattern.

Redis pub/sub vs Kafka on the fan-out leg: Redis is fire-and-forget. No consumer groups, no message replay. If a WebSocket server is restarting when a message is published, that subscriber misses it and catches up via cursor on reconnect. This is acceptable for chat-style feeds. Kafka is the right answer when you need guaranteed delivery to every downstream consumer, but that belongs to the async processing layer, not the real-time push layer.

Consistency of comment ordering: Comments from multiple writers arrive at different PostgreSQL replicas at slightly different timestamps. Snowflake IDs are generated by a central ID service, so their ordering is authoritative. The client sorts by ID, not by created_at, and the ordering is stable. Use database-generated timestamps without a Snowflake service and you need to account for clock skew across nodes.

A Clock for the Full 45 Minutes

  • 0-5: Clarify requirements. Real-time or near? Nested or flat? What scale? Moderation sync or async?
  • 5-15: Draw the two-path architecture. Name the write path and the fan-out path explicitly. Explain why pub/sub decouples Comment Service from WebSocket servers.
  • 15-22: Walk through the data model. Snowflake IDs, the two indices, cursor pagination. Sketch the WebSocket subscribe flow.
  • 22-32: Deep-dive the real-time path. Redis pub/sub mechanics, why Kafka stays off the hot path, what happens on client reconnect.
  • 32-40: The hot post problem. Batching, adaptive polling fallback, Redis sorted set cache. Read replicas and approximate like counts.
  • 40-45: Name the tradeoffs. WebSocket vs SSE, Redis vs Kafka, hot-row mitigation. Invite questions.

The Short Version

  • The fan-out problem, not write volume, drives this architecture.
  • Write path ends at PostgreSQL; fan-out path goes through Redis pub/sub to WebSocket servers.
  • Snowflake IDs make cursor pagination and chronological ordering collapse into one composite index.
  • Redis pub/sub is fast enough for the hot path; Kafka belongs downstream for durability and analytics.
  • For viral posts, switch from WebSocket push to short polling against a Redis sorted set cache.
  • SSE is a viable and often overlooked alternative to WebSocket for read-heavy comment feeds.
  • Approximate like counts via Redis INCR and async flush to avoid hot-row contention.

Practicing this walkthrough out loud is the fastest way to find where your explanation loses the interviewer before the real thing happens. SpaceComplexity runs voice-based mock system design interviews with rubric-based feedback, so you hear the gaps in your narrative before they cost you an offer.

For the push-vs-pull tradeoffs in more depth, see The Trade-off Maze. The Redis caching patterns here are covered in detail in the Distributed Cache System Design walkthrough. The pub/sub fan-out architecture is structurally similar to the Slack System Design if you want to compare a more complex variant.

Further Reading