Design a News Feed System: What the Interview Is Really Testing

You've got 45 minutes. Most candidates spend 30 of them on the obvious parts: how to store posts, how to model follows, what endpoints to expose. Then they run out of time right before the part the interviewer actually cares about: how do you decide which 20 posts out of 2,000 candidates a user should see first?

That's the whole game. The storage is a solved problem. The ranking is not.

Full architecture, requirements to bottlenecks, with the ranking pipeline front and center.

Two Pipelines, Not One

A news feed system has two completely separate jobs. First: collect all posts a user could theoretically see. Second: decide which ones to show. These are different subsystems with different tradeoffs, and conflating them is the fastest way to produce a design that falls apart under questioning.

Think of it as two pipelines:

The write path runs when a user posts. It fans out the new post to relevant feed caches.
The read path runs when a user opens the app. It fetches candidates and ranks them.

The two pipelines share exactly one thing: the cache. Everything else is separate.

Keep these paths separate throughout the interview. The interviewer is tracking the same division.

Nail the Requirements First

Before drawing a single box, ask these questions out loud. They constrain every tradeoff that follows. Candidates who skip this step end up defending a chronological sort when the interviewer wanted ML ranking. Awkward.

Functional requirements:

Users can follow other users
Users can publish posts (text, images, links)
Users see a ranked feed of posts from followed accounts, paginated (20 posts per page, infinite scroll)

Non-functional requirements:

300 million daily active users
Feed load time under 500ms at p99
New posts appear in followers' feeds within 5 seconds of publishing
99.9% availability; eventual consistency is fine

The clarifying question that decides everything: "Is the feed chronological or ranked?" A chronological feed is a sort. A ranked feed is a multi-stage pipeline. They look similar on paper and are wildly different in implementation. Ask it before you design anything.

Also ask: "How many followers can a celebrity user have?" The answer determines whether you need a hybrid fanout approach, which you almost certainly do.

High-Level Architecture

Five components carry the full system:

Post Service handles post creation and writes to the Posts database
Fanout Service determines which feed caches to update after each post
Message Queue (Kafka or SQS) decouples fanout from the write path so a single post doesn't create latency spikes
Feed Cache (Redis) stores pre-computed feed lists per user as sorted sets
Feed Service handles the read path: fetches from cache, runs ranking, returns the response

Write path pre-computes. Read path consumes. They share only the cache.

State what you're optimizing for early. Reads outnumber writes roughly 100:1 on a social feed. Pre-compute aggressively on the write side. Accept slightly stale data. Every architectural choice flows from this.

Three Tables, Two Patterns

Three tables do most of the work.

posts (
  post_id       UUID PRIMARY KEY,
  author_id     UUID NOT NULL,
  content       TEXT,
  media_url     TEXT,
  created_at    TIMESTAMP,
  like_count    INT DEFAULT 0,
  comment_count INT DEFAULT 0
)

follows (
  follower_id  UUID,
  followee_id  UUID,
  created_at   TIMESTAMP,
  PRIMARY KEY (follower_id, followee_id)
)

For the feed cache, use Redis sorted sets. The score is a composite of recency and predicted engagement:

ZADD feed:{user_id} {composite_score} {post_id}
ZREVRANGE feed:{user_id} 0 19   # top 20 by score

Sorted sets give you O(log n) insert and O(1) range reads. That's the shape you want for a feed that updates constantly and gets read constantly.

Posts live in Cassandra or a sharded relational database. The write volume from fanout operations (a single post touching millions of caches) makes Cassandra's write-optimized log-structured merge tree attractive. The follows table can live in PostgreSQL with indexes on both columns.

Fanout on Write vs Fanout on Read: Push, Pull, or Hybrid

This is where most candidates get tripped up. There are three approaches. Two of them are wrong by themselves.

Fan-out on Write (Push): When a post is created, immediately write it to every follower's feed cache. Reading the feed is just a Redis lookup: O(1). The cost is O(followers) writes per post. For a normal user with 300 followers, fine. For a user with 10 million followers, a single post generates 10 million Redis writes, synchronously, on the hot path. Beyoncé posts and your system goes down. Not great.

Fan-out on Read (Pull): When a user opens their feed, query the recent posts of every account they follow in real time. No pre-computation, works fine for celebrities, data is always fresh. The cost is O(following_count) database reads per page load. A user following 1,000 accounts triggers 1,000 reads before you can start ranking. Your p99 climbs into second-tier embarrassment territory.

Hybrid: Push for regular users (fewer than 10K followers), pull for celebrities. At read time, merge the pre-computed Redis feed with real-time celebrity posts, then rank the combined set.

Push for regulars, pull for celebrities, merge at read time. The user sees one feed.

The 10K threshold isn't magic. It's approximately the inflection point where per-post fanout cost exceeds per-read fetch cost. Tune it based on your observed traffic.

The merge step at read time hides the seam entirely from the user. They see one ranked feed regardless of whether each post came from cache or a live query.

Ranking: From Score to Pipeline

Once you have candidates, you need to order them. There are three levels of sophistication, and you should describe at least Level 2 in an interview.

Level 1: Chronological. Sort by created_at. Simple, predictable, and what Twitter used before its algorithmic timeline. Completely valid if the interview scope is limited. Also the feed equivalent of assembling IKEA furniture with no tools: technically works, nobody's impressed.

Level 2: Engagement-weighted score with time decay. Compute a composite score for each post:

import math

def score(post, user, now):
    age_hours = (now - post.created_at).total_seconds() / 3600
    time_decay = math.exp(-0.1 * age_hours)  # halves every ~7 hours
    engagement = post.likes * 1 + post.comments * 2 + post.shares * 3
    affinity = interaction_frequency(user.id, post.author_id)
    return engagement * time_decay * affinity

Time decay with λ = 0.1 keeps the feed fresh without making it purely recency-based. A post with 10,000 likes ages out gracefully rather than disappearing the moment something newer arrives.

Level 3: Multi-stage ML pipeline. Facebook's production system runs three passes:

Pass 0 (filtering): A cheap model cuts 2,000 candidates to ~500. Removes already-seen posts, blocked users, muted topics. Speed matters more than precision here.
Pass 1 (neural scoring): A neural network predicts P(like), P(comment), P(share) for each of the 500 candidates. The final score is a weighted sum: 0.2 * P(like) + 0.5 * P(comment) + 1.0 * P(share). Shares get the highest weight because they're the strongest engagement signal.
Pass 2 (diversification): Rule-based pass prevents clustering. No more than three consecutive posts from the same author. Cap videos at 30% of the feed. Inject trending content. This is where business logic lives, separate from the ML model.

Pass 0 exists because running a full neural network on 2,000 posts per request costs too much. Pass 2 exists because a pure ML score produces filter bubbles. Both are real constraints, not padding.

You get the most interview credit for explaining why each pass exists, not just listing them. Pass 0 exists because running the full neural network on 2,000 posts per request is too expensive. Pass 2 exists because a pure ML score produces homogeneous feeds. These are real engineering constraints, not arbitrary steps.

Caching: What to Pre-Compute

Pre-compute feeds for active users only. Users inactive for 30 or more days get no pre-computed feed. Compute it lazily when they log back in.

Each user's feed cache holds the top 500 scored post IDs, not the full post content. The Feed Service fetches 500 IDs, re-ranks in-process (fast, no DB round-trip needed), and returns the top 20. This pattern absorbs 90%+ of read traffic with zero database access.

For viral posts, a single post ID lands in millions of users' feed caches. This creates a hot key problem. Replicate the post content across multiple Redis nodes rather than trying to shard by post ID. Each node serves a subset of users, no coordination needed.

Tradeoffs Worth Naming Explicitly

Naming tradeoffs explicitly is what separates a senior answer from a junior one. Cover these:

Decision	What to pick	Why
Fanout	Hybrid push/pull	Push for regulars, pull for celebrities, merge at read time
Ranking	Level 2+ composite score	Chronological doesn't retain users at scale
Pagination	Cursor-based, not offset	Offset breaks as the feed updates under pagination
Feed storage	Redis sorted set	O(log n) write, O(1) range read
Post storage	Cassandra or DynamoDB	Write-heavy fanout pattern

Cursor-based pagination deserves a specific mention. Offset pagination returns page 2 of "posts sorted by score" but the scores change constantly. A cursor encodes your position relative to a specific post, so the page boundary stays stable even as new posts arrive. Offset pagination on a live-ranked feed is like bookmarking a river.

How to Spend the 45 Minutes

Interviewers notice when you spend 30 minutes on database schemas and forget to mention ranking. Here's the clock:

0 to 5 min: Requirements clarification. Ask about ranking vs chronological, max follower count, content types.
5 to 10 min: Sketch the two-pipeline mental model. Label write path and read path separately before adding any boxes.
10 to 20 min: Data model and APIs. Three tables, three endpoints (POST /posts, POST /users/:id/follow, GET /feed?cursor=).
20 to 30 min: Fanout deep dive. Explain push vs pull, land on hybrid, name the threshold, describe the read-time merge.
30 to 40 min: Ranking deep dive. Walk through the scoring formula or the three-pass ML pipeline.
40 to 45 min: Scaling and tradeoffs. Celebrity problem, hot keys, cursor pagination.

The three signals interviewers listen for: you identify the celebrity problem without being prompted, you explain why hybrid fanout is better than either pure approach, and you describe ranking as a multi-stage pipeline rather than a single sort. Get those three right and you are in strong hire territory at most companies.

Articulating this under time pressure, with someone pushing back on every choice, is much harder than it reads on paper. If you want to practice the full walkthrough with live questioning, SpaceComplexity runs voice-based mock system design sessions with rubric-based feedback. Most candidates discover they can sketch the fanout correctly but fumble when asked to defend their ranking formula.

Recap

Two pipelines: write-path fanout and read-path ranking are separate subsystems with separate tradeoffs
Clarify ranked vs chronological and ask about max follower count before designing anything
Hybrid fanout: push for users with fewer than 10K followers, pull for celebrities, merge at read time
Ranking: engagement score with time decay at minimum; three-pass ML pipeline for production
Pre-compute feeds for active users only; top 500 IDs in Redis sorted sets per user
Cursor-based pagination keeps page boundaries stable as scores update
Name your tradeoffs explicitly; that's what separates a senior answer