Slack System Design Interview: Real-Time Fan-Out at Billions of Messages a Day

May 27, 202610 min read
interview-prepcareerdsaalgorithms
Slack System Design Interview: Real-Time Fan-Out at Billions of Messages a Day
TL;DR
  • Fan-out is the core constraint: one write in a 10K-member channel triggers 2,000 delivery events at peak
  • Persist before delivering: MySQL write completes before the real-time stack sees the message
  • Channel Servers own channels via consistent hashing; CHARMs detect failures and reassign within 20 seconds
  • Gateway Servers filter delivery using an in-memory channel subscription index, so the CS fans out only to GSs with active subscribers
  • Presence uses a viewport trick: clients subscribe only to the ~50 members visible on screen, not all 10,000 channel members
  • Shard by workspace_id: all data for one company lives on a single Vitess shard, routed via a metadata cluster
  • At-least-once delivery with client-side dedup via client_msg_id is simpler and faster than any server-side exactly-once scheme

In a Slack system design interview, the hard part is not storing messages. It's getting one message to ten thousand people in a different timezone in under 500 milliseconds. That's the fan-out problem, and Slack's entire real-time stack exists to solve it.

One mental model carries the whole design: Slack splits persistence and delivery. A message lands durably in MySQL before anyone sees it. A separate, stateful layer then blasts it to every connected client worldwide. Hold that split and everything else follows.


https://assets.spacecomplexity.ai/blog/content-images/slack-system-design-interview/1779857969941-meme.jpg Someone pinged Aman on Slack so many times that an engineer had to write a commit to kill the feature. That commit is the fan-out problem, embodied.


Clarify the Scope First

Spend the first five minutes narrowing scope. Slack has dozens of features. You cannot design all of them in 45 minutes, and interviewers don't expect you to.

Core scope for the interview:

  • Send and receive messages in channels and DMs
  • Real-time delivery to online users
  • Persistent message history
  • Online presence indicators
  • File attachments

Stretch goals to offer but not commit to:

  • Full-text search
  • Threads
  • Push notifications for offline users

Ask the interviewer whether to prioritize search or the real-time delivery path. Most interviewers want the real-time side. Search is a separate 20-minute rabbit hole. Budget accordingly.


Estimate the Scale

Slack in 2025 sits at roughly 40 million daily active users, 750,000 organizations, and 1.5 billion messages per day. That's about 17,000 messages per second on average and 100,000 per second at peak.

The interesting math is fan-out. A large public channel can have 10,000 members. If 2,000 are online when a message lands, one write creates 2,000 delivery events. Multiply that across thousands of active channels and you've got a genuinely nasty problem. A naive approach that queries the database for subscribers on every send would collapse fast.

Fan-out is the constraint that drives the entire architecture. Every interesting decision follows from that number.

Database scale: 2.3 million QPS peak, 7:1 reads to writes, median latency 2 ms, p99 11 ms.


Five Boxes, One Pipeline

https://assets.spacecomplexity.ai/blog/content-images/slack-system-design-interview/1779857970665-architecture.png The five-box pipeline: Client connects via WebSocket to Gateway Servers. Channel Servers own the fan-out. Admin Servers bridge HTTP to the CS layer. All persistence flows through MySQL/Vitess. Presence Servers handle online indicators via GS as proxy.

Five major components, left to right:

  • WebApp API: HTTP endpoints for sending messages and fetching history. Originally PHP with HHVM, now backed by Java services on the real-time critical path.
  • Gateway Servers (GS): stateful, in-memory, maintain WebSocket connections and per-user channel subscriptions. Deployed across geographic regions.
  • Channel Servers (CS): stateful, in-memory, own channel membership and fan-out responsibility. A single CS host manages roughly 16 million channels at peak.
  • Admin Servers (AS): bridge the HTTP API layer to the Channel Servers. The routing layer.
  • Presence Servers (PS): in-memory, track who is online. Queried through the GS as a proxy.

What Goes in the Database

Shard by workspace_id. All data for a given company lives on the same Vitess shard. A metadata cluster maps workspace_id to shard. Cross-workspace queries don't happen at the database layer.

-- workspace_id on every table routes to the right Vitess shard messages ( message_id BIGINT PRIMARY KEY, -- Snowflake ID; exposed as Slack's `ts` workspace_id BIGINT NOT NULL, channel_id BIGINT NOT NULL, user_id BIGINT NOT NULL, content TEXT, client_msg_id VARCHAR(64), -- dedup token from client created_at TIMESTAMP ) channels (channel_id, workspace_id, name, is_private) channel_members ( channel_id BIGINT, user_id BIGINT, PRIMARY KEY (channel_id, user_id) )

Slack's ts field in the public API is the message ID masquerading as a Unix timestamp. It's unique within a channel and monotonically increasing, so ordering comes for free.

Files never live in the database. A file upload goes directly from the client to S3 via a pre-signed URL. The resulting message carries the S3 URL as a reference. File bytes never flow through Slack's backend servers.


The Message Send Path

This is the section most candidates hand-wave. Don't. Walk it step by step with actual component names.

https://assets.spacecomplexity.ai/blog/content-images/slack-system-design-interview/1779857971053-message-send-path.png The six-step send path. MySQL write happens before the real-time stack sees anything. Crash after step 2 and the message is safe. The CS only fans out to GSs that actually have subscribers.

Step 1. The user hits send. The client generates a random client_msg_id as a dedup token and POSTs the message to the WebApp API (or sends it through the existing WebSocket).

Step 2. The WebApp API writes the message to MySQL via Vitess and gets back a server-assigned message_id. This write happens before anything goes to the real-time stack. If the server crashes after this point, the message is safe. The client retries with the same client_msg_id and the server detects the duplicate.

Step 3. The API passes the persisted message (now carrying a stable message_id) to the Admin Server.

Step 4. The Admin Server routes to the Channel Server responsible for this channel. CS assignment uses consistent hashing via CHARMs (Consistent Hash Ring Managers), which publish ring state to Consul.

Step 5. The Channel Server fans out. It holds an in-memory index of which Gateway Servers have at least one subscriber for this channel. It sends the message payload to each of those GSs, and only those GSs.

Step 6. Each Gateway Server pushes the message down the WebSocket to its connected clients that are subscribed to this channel.

Delivery from CS to client is at-least-once. A slow or failing GS may receive a retry. Clients use client_msg_id to drop duplicates before rendering.


Why Channel Servers Own the Fan-Out

The CS is the key architectural insight. Rather than looking up channel subscribers in the database on every message, the CS holds that state in memory and keeps it warm.

Each CS owns a slice of all channels via a consistent hash ring. Holding 16 million channels per host is feasible because a channel at rest is a tiny in-memory record. Most channels are quiet most of the time.

https://assets.spacecomplexity.ai/blog/content-images/slack-system-design-interview/1779857971455-hash-ring.png The consistent hash ring assigns channels to CS nodes by position. CHARM manages the ring topology and publishes it to Consul. When a CS fails, CHARMs redistribute its channels to healthy nodes within 20 seconds.

When a CS goes down, CHARMs detect the failure and reassign its channels to healthy nodes within 20 seconds. During that window, real-time delivery for affected channels pauses. Messages are already persisted in MySQL, so clients catch up via a history fetch once the new CS warms up. Nobody loses data. They just wait a moment.


How Gateway Servers Filter Delivery

Every connected client maintains a single persistent WebSocket to the nearest Gateway Server. Proximity is handled by Anycast or DNS geo-routing.

The GS maintains a local map: channel_id → [websocket_connection, ...]. When the CS fans out a message for channel X, it sends only to GSs with subscribers for channel X. A GS with no subscribers receives nothing.

https://assets.spacecomplexity.ai/blog/content-images/slack-system-design-interview/1779857971964-fanout-subscription.png CS fans out to GS-1 and GS-3 (they have subscribers to #general) and skips GS-2 entirely. GS-2's users are not in that channel. No wasted delivery events.

The subscription index is what makes fan-out manageable. Without it, the CS would have to broadcast to all GSs in all regions for every message. With 100K messages per second, that would be spectacular in all the wrong ways.


Presence: The Viewport Trick

Presence Servers are in-memory. Each user is consistently hashed to a specific PS. Clients subscribe to presence updates through their GS, which proxies requests to the appropriate PS.

The optimization that makes this scale: clients only subscribe to presence for users currently visible on screen. A channel with 10,000 members does not create 10,000 presence subscriptions. The client subscribes to the roughly 50 members visible in the current viewport. Scroll down and the subscription list updates.

This drops the presence problem from O(workspace members) to O(viewport size). Typing indicators and green dots respond in milliseconds even in sprawling workspaces. The kind of optimization that looks obvious in retrospect.


Search Gets Its Own Pipeline

Search is asynchronous and eventually consistent. Interviewers rarely expect deep search coverage, but mention the pipeline so they know you've thought about it.

https://assets.spacecomplexity.ai/blog/content-images/slack-system-design-interview/1779857972349-search-pipeline.png Message hits MySQL, gets published to Kafka, gets consumed by the indexing service, lands in Elasticsearch. Seconds of lag. Nobody searches for a message sent four seconds ago.

After a message is written to MySQL, the write path publishes an event to Kafka. A dedicated indexing consumer reads from Kafka and writes to Elasticsearch. The pipeline lags real-time delivery by seconds. That's fine. Elasticsearch handles keyword search, sender filters, date ranges, and channel scoping. The search path scales independently. A Kafka backlog never delays delivery.


Four Tradeoffs Worth Defending

Stateful servers vs. stateless. Stateless would require a database lookup per message to find subscribers. At 100K messages per second that's a non-starter. Statefulness buys fan-out speed at the cost of more complex failure handling. Worth it.

At-least-once delivery. Exactly-once across a network requires two-phase commit overhead that kills latency. Slack accepts duplicate delivery and deduplicates at the client with client_msg_id. Simpler, faster, and more reliable. See The Trade-off Maze for more on this family of tradeoffs.

Vitess over raw MySQL. Application-level sharding gets complicated fast. Vitess adds connection pooling, transparent query routing, and zero-downtime schema changes at the MySQL level. Slack migrated to Vitess in 2017; it now handles 99% of query load with workspace_id as the primary shard key.

Presence decoupled from messages. Presence changes thousands of times per second and carries no long-term value. Separate in-memory PS servers handle it without touching MySQL.


The 45-Minute Clock

TimeWhat to Cover
0-5 minClarify scope: channels, DMs, real-time, presence, files, search. Commit to two of the last three.
5-10 minScale: 40M DAU, 1.5B messages/day, 100K/sec peak, fan-out math for a 10K-member channel
10-20 minHigh-level architecture: the five-box diagram with named components
20-30 minData model and the full message send path, step by step
30-38 minCS consistent hashing, GS subscription index, presence viewport trick
38-43 minFile storage (direct-to-S3), search pipeline (async Kafka consumer)
43-45 minTradeoffs: stateful vs. stateless, at-least-once, Vitess sharding

The message send path and the CS/GS fan-out are where strong answers become great ones. Specific component names, the CHARM mechanism, and the subscription index optimization. That's 80% of what interviewers want.


What the Slack System Design Interview Tests

Five things interviewers are listening for:

  1. Fan-out is the core constraint. Every architectural choice follows from one message spawning 2,000 delivery events.
  2. Persist before you deliver. MySQL write happens before the real-time stack sees the message.
  3. CS and GS are stateful for speed. The subscriber index lives in memory because a DB lookup per message at 100K/sec is not a plan.
  4. Presence uses the viewport trick. O(workspace members) becomes O(50 visible users).
  5. At-least-once with client-side dedup. client_msg_id is the simplest exactly-once substitute that actually works.

Explaining a system design out loud under time pressure is a different skill from understanding the architecture on paper. SpaceComplexity runs voice-based mock system design interviews with rubric-graded feedback on structure, communication, and technical depth, so you practice under conditions that match the real interview.

For related system design walkthroughs, see Design a Chat App Like WhatsApp and the Notification System Design Interview.


Further Reading