Email System Design Interview: The 45-Minute Walkthrough

The email system design interview trips most candidates because they treat this as one problem. It's two. Sending email and receiving email are architecturally separate systems that share a data layer, and every strong answer starts by splitting them apart.

Make that split in the first minute and the rest falls into place: an outbound SMTP pipeline (your servers talk to the world), an inbound SMTP gateway (the world talks to your servers), and an HTTP/WebSocket layer above both for the client. Two SMTP flows, six components, one interview.

Email has been around since 1971. It still somehow trips engineers in interviews in 2025. That's not because email is boring. It's because most engineers have never had to design a system where "message lost" is the worst possible outcome, not just an edge case.

Start With Two Flows, Not One

Before drawing anything, say this out loud: "I see two distinct flows here. Outbound delivery and inbound reception use different protocols, different failure modes, and different reliability guarantees. I want to design them separately and then show where they share infrastructure."

Interviewers hear this framing and know you've thought beyond the surface. Most candidates jump straight to the database and never separate the flows at all. They sketch one big blob labeled "email server" and start explaining tables. Don't be that person.

The other thing worth naming early: an email service can mean a consumer webmail product like Gmail, a transactional API like SendGrid, or an enterprise mail server. They share protocols but differ wildly in who the customer is, what reliability means, and where scale pressure shows up. Ask which one you're designing. This walkthrough assumes consumer webmail.

Lock Down Requirements in Five Minutes

Functional:

Send and receive email, attachments up to 25MB
Folders and labels (Inbox, Sent, Trash, Spam, custom)
Full-text search across subject, body, and sender
Threading: group replies under the original message
Read/unread status synced across devices in real time

Non-functional:

Durability above all else. Emails must never be lost.
High availability. Downtime means missed messages.
Eventual consistency is fine for search and inbox rendering. Strong consistency is required for mailbox state: unread count, folder membership, read status.

Email is one of the few consumer apps where users directly notice eventual consistency. Read a message on your phone and your laptop still shows it unread five seconds later: that's not a "known limitation." That's a support ticket.

The Numbers Shape the Design

Gmail has 1.8 billion users. Roughly 300 billion emails are sent globally per day, about 3.5 million per second. For an interview, scope to 500 million users and 1 billion emails per day.

Quick back-of-envelope:

Average email metadata: ~500 bytes
Average email body: ~10KB
1 billion emails per day at 10KB = 10TB of new body data per day
500 million users at 15GB storage each = ~7.5 petabytes total

That 7.5 petabytes number is what tells you Cassandra plus object storage, not just one database. Metadata and bodies belong in different storage systems. Metadata is small, structured, needs strong consistency, and gets queried constantly. Bodies and attachments are large, append-only, and read infrequently after delivery. That split drives most of the data model decisions you'll make for the rest of the interview.

Six Boxes, Every Question Answered

Draw this before explaining anything:

Six-box email architecture showing two labeled flows: outbound from API servers through SMTP workers to external mail servers, and inbound from external servers through gateway, queue, and processor to Cassandra and object storage

Two flows, six boxes. Every interviewer question maps to one of these.

Every interviewer question about this system maps to one of these six boxes. When they ask "how does search work?", you're talking about the rightmost box and the async write path from the mail processor. When they ask "how do you handle delivery failures?", you're talking about the queue feeding outbound SMTP workers. Point at a box. It signals you have a mental model, not just a list of buzzwords.

What Happens When You Click Send

Step-by-step send flow: API server validates, writes to Sent folder, enqueues to Kafka, SMTP worker picks up, DNS MX lookup, delivers; failure paths show exponential backoff and bounce handling

HTTP 200 goes back to the client at step three. The actual SMTP delivery is someone else's problem now.

A user clicks send:

API server validates the request: auth check, recipient format, attachment under 25MB.
Email is saved to the Sent folder immediately. User sees it.
A job is enqueued in a durable message queue (Kafka or SQS). The API server returns 200 now.
An outbound SMTP worker pulls the job, does a DNS MX lookup for the recipient's domain, opens a TCP connection to their mail server, and delivers via SMTP (RFC 5321).
Temporary failures (server overloaded, connection timeout): exponential backoff, retry for up to 72 hours. Hard failures (unknown address, domain not found): write a bounce notification to the sender's inbox.

The queue is the delivery reliability guarantee. If an SMTP worker crashes mid-connection, the job stays in the queue and another worker picks it up. Workers are stateless. Scale them horizontally by adding more consumers. This is one of those rare architecture decisions where the right answer is also the simple answer.

What Happens When Email Arrives

Inbound email flow: external server connects to gateway, gateway rejects or accepts fast, message goes through Kafka queue to mail processor which branches into SPF/DKIM/DMARC checks, spam pipeline, Cassandra write, object store write, and async search indexing, plus WebSocket push

The gateway's job is to be fast and mean. Everything else can wait.

An external mail server connects to your inbound SMTP gateway and delivers a message. The gateway does one thing: accept or reject the SMTP session fast. Heavy processing happens downstream in the mail processor, separated by a queue. If the processing pipeline falls behind, the gateway stays up. If the processor crashes, the queue holds the messages until it recovers. That's not an accident.

Processing chain:

Authentication checks: SPF (is this IP authorized to send for this domain?), DKIM (is the message signature valid against the domain's public key?), DMARC (what should happen on SPF or DKIM failure?). Reject or quarantine on failure. These three checks stop most spoofed and phishing mail.
Spam pipeline: fast rule-based checks first (known-bad IPs, malformed headers, DNS blocklists), then ML classification on content. Route to Spam folder or reject. Gmail blocks over 99.9% of spam and phishing with this layer.
Storage write: metadata row to Cassandra, body to object storage. The metadata row stores a body_storage_key pointer, not the body itself.
Real-time push: if the recipient has an active WebSocket connection, push a lightweight event ("new message in INBOX"). Client fetches the full email via normal HTTP.
Search index: async publish to Elasticsearch. The inbox view does not wait for this.

How the Data Actually Sits

Data model comparison showing partition-by-user_id-only (bad, forces sequential scans) versus partition-by-(user_id,folder) (good, inbox query hits one node), with PRIMARY KEY schema

The partition key decision is the most important schema choice in this whole design. Get it wrong and every inbox load fans out across the cluster.

Two tables drive everything:

-- Partitioned by user, clustered newest-first within each folder
emails (
  user_id       UUID,           -- partition key
  email_id      TIMEUUID,       -- clustering key, newest first
  folder        TEXT,           -- INBOX | SENT | TRASH | SPAM | custom
  thread_id     UUID,
  from_addr     TEXT,
  subject       TEXT,
  is_read       BOOLEAN,
  body_key      TEXT,           -- S3 object key
  timestamp     TIMESTAMP,
  PRIMARY KEY ((user_id, folder), email_id)
)

threads (
  thread_id     UUID PRIMARY KEY,
  subject       TEXT,
  last_message_at TIMESTAMP,
  message_ids   LIST<UUID>
)

Partition by user_id and folder together. All of a user's inbox emails live on one shard. "Show me inbox page 2" hits a single node. Without this, every inbox query fans out across the cluster, which is slow and expensive at 500 million users. You could have the most elegant schema in the world and still make page two of the inbox a cluster-wide broadcast.

Threading uses the standard email headers: Message-ID, In-Reply-To, and References (defined in RFC 5322). When a new message arrives, look up In-Reply-To in the threads table. Match found: append to that thread. No match: create a new thread. This is server-side threading, which stays consistent across all clients.

Attachments go to object storage at key {user_id}/{email_id}/{filename}. The metadata row never stores attachment bytes.

Search Is a Read Replica, Not the Database

If Elasticsearch goes down, search breaks. The inbox doesn't. That's the deal.

Cassandra can filter on subject within a user partition, but that's a sequential scan. Real search needs an inverted index.

Write path: when a message is stored, the mail processor publishes an event to Kafka. A search indexer consumes it and writes to Elasticsearch. Index documents are scoped per user, so querying "all emails from [email protected]" is a single-shard scan.

Query path: user searches "budget meeting Q3". API server queries Elasticsearch filtered by user_id. Returns email IDs. API server fetches metadata from Cassandra for rendering.

The indexing lag is typically under a second. If Elasticsearch is down, search shows an error but the inbox still loads from Cassandra. The search index is a read replica. Cassandra is the source of truth. Never let the search index get promoted to primary storage. The moment you start writing to Elasticsearch and only Elasticsearch, you've turned your read replica into a database without any of the durability guarantees that entails.

For a deeper look at this dual-write pattern and its tradeoffs, the key-value store design post covers the same consistency split for a different domain.

50 Million Open Sockets

Real-time WebSocket routing: load balancer uses consistent hashing on user_id for sticky sessions; real-time servers subscribe to per-user Redis pub/sub channels; mail processor publishes notifications; memory math shows 50M connections × 64KB = 3TB across 100-200 servers

Fifty million open TCP connections sounds terrifying. The math says 100 servers. Breathe.

At 500 million users, assume 10% are active at any time: 50 million concurrent WebSocket connections. Each connection uses about 64KB of RAM, so 50 million connections is about 3TB across the real-time server fleet. That's manageable with 100 to 200 servers at 32GB each.

The real-time servers need sticky routing. User A's WebSocket connection must always land on the same server instance. Use a load balancer with consistent hashing on user_id. When the mail processor stores a new message for user A, it publishes to a Redis pub/sub channel. The real-time server holding user A's connection subscribes to that channel and pushes the notification.

For mobile clients or flaky connections, fall back to long polling: client holds a 30-second HTTP request open. Server responds when new mail arrives or when the timeout fires. Same pub/sub backend, simpler client. See the push vs pull tradeoff analysis for the full latency versus connection cost breakdown.

This Is the One Place You Choose CP Over AP

Most distributed systems choose AP for consumer-facing features. Email is the exception. And when you say this out loud in an interview, interviewers sit up a little straighter.

Use a single primary node per mailbox. All writes for user X route through one node. Replicas handle reads. During a primary failover, writes block briefly rather than accepting conflicting updates from two nodes. This is the right call because conflicting updates to a mailbox (two nodes disagree on whether a message was read) are very hard to reconcile and extremely visible to users.

The distributed cache design covers how to handle leader election and replica lag if you want the failure mode details.

For search and inbox rendering, AP is fine. A stale inbox view is annoying. A stale unread count that never corrects is a bug your users will file a ticket about and then tweet about.

Pacing Your Email System Design Interview

0-5: Clarify scope (consumer vs transactional), core features, scale targets. State the two-flow insight.
5-10: Estimation. Storage volume, daily email QPS, concurrent connections.
10-20: Draw the six-box architecture. Walk through send flow, then receive flow.
20-30: Data model. Partition key rationale, why body goes to object store, threading headers.
30-40: Deep dive on one hard problem. Good choices: search (Elasticsearch dual-write, consistency lag), real-time sync (WebSocket scaling, pub/sub routing), or SMTP delivery reliability (queue, retry, bounce handling).
40-45: Tradeoffs. CP vs AP for mailbox state. Elasticsearch as read replica. Attachment size policy. What changes at 10x scale.

The interviewer will often direct the deep dive. If they don't, pick search or real-time sync. Both let you show layered thinking: consistency tradeoffs, async pipelines, failure handling. Both also have the nice property of being problems you can explain without needing to invent anything. The solutions exist. You just need to know them and know why.

Recap

Email is an outbound pipeline plus an inbound pipeline plus an HTTP/WebSocket API above both. Design all three.
Partition mailbox data by (user_id, folder) so inbox queries hit one node.
Email bodies and attachments belong in object storage. Metadata belongs in Cassandra with a pointer to the body.
Dual-write to Elasticsearch for search. It's a read replica, not the source of truth.
Real-time inbox updates use WebSocket with Redis pub/sub for notification routing. Mobile falls back to long polling.
SPF, DKIM, and DMARC run at the inbound gateway before any storage write.
Choose single-primary-per-mailbox (CP) for mailbox state. This is the one consumer feature where consistency beats availability.
The queue between inbound gateway and mail processor is the durability guarantee. The queue between API server and outbound SMTP workers is the delivery reliability guarantee.

If you want to rehearse walking through this out loud under real time pressure, SpaceComplexity runs voice-based system design mocks with rubric feedback on communication, tradeoff reasoning, and pacing.