Design a News Feed System: What the Interview Is Really Testing

June 11, 202612 min read
System DesignInterview PrepBackend EngineeringFeed Algorithms
Design a News Feed System: What the Interview Is Really Testing
TL;DR
  • Two-pipeline model: write-path fanout and read-path ranking are separate subsystems with separate tradeoffs — conflating them is the fastest way to fail the deep dive
  • Hybrid fanout beats pure push or pull: push for users under 10K followers, pull for celebrities, merge the two at read time
  • Three-pass ranking pipeline: cheap filter cuts 2,000 candidates to ~500, neural net scores P(like/comment/share), diversity rules prevent feed homogeneity
  • Redis sorted sets store the top 500 scored post IDs per user, absorbing 90%+ of read traffic with zero database hits
  • Cursor-based pagination keeps page boundaries stable as scores shift; offset pagination breaks on any live feed
  • The clarifying question that decides everything: ranked vs chronological and max follower count before you draw a single box
  • Three signals interviewers listen for: celebrity problem spotted without prompting, hybrid fanout justified, ranking described as a multi-stage pipeline not a sort

You've got 45 minutes. Most candidates spend 30 of them on the obvious parts: how to store posts, how to model follows, what endpoints to expose. Then they run out of time right before the part the interviewer actually cares about: how do you decide which 20 posts out of 2,000 candidates a user should see first?

That's the whole game. The storage is a solved problem. The ranking is not.

Full architecture, requirements to bottlenecks, with the ranking pipeline front and center.


Two Pipelines, Not One

A news feed system has two completely separate jobs. First: collect all posts a user could theoretically see. Second: decide which ones to show. These are different subsystems with different tradeoffs, and conflating them is the fastest way to produce a design that falls apart under questioning.

Think of it as two pipelines:

  • The write path runs when a user posts. It fans out the new post to relevant feed caches.
  • The read path runs when a user opens the app. It fetches candidates and ranks them.

Two-pipeline overview: write path (left) shows User Posts to Post Service to Fanout Service to Kafka Queue to Feed Workers to Redis Feed Cache; read path (right) shows User Opens App to Feed Service to Redis Feed Cache to Ranking Service to Ranked Response. Both paths share only the cache.

The two pipelines share exactly one thing: the cache. Everything else is separate.

Keep these paths separate throughout the interview. The interviewer is tracking the same division.


Nail the Requirements First

Before drawing a single box, ask these questions out loud. They constrain every tradeoff that follows. Candidates who skip this step end up defending a chronological sort when the interviewer wanted ML ranking. Awkward.

Functional requirements:

  • Users can follow other users
  • Users can publish posts (text, images, links)
  • Users see a ranked feed of posts from followed accounts, paginated (20 posts per page, infinite scroll)

Non-functional requirements:

  • 300 million daily active users
  • Feed load time under 500ms at p99
  • New posts appear in followers' feeds within 5 seconds of publishing
  • 99.9% availability; eventual consistency is fine

The clarifying question that decides everything: "Is the feed chronological or ranked?" A chronological feed is a sort. A ranked feed is a multi-stage pipeline. They look similar on paper and are wildly different in implementation. Ask it before you design anything.

Also ask: "How many followers can a celebrity user have?" The answer determines whether you need a hybrid fanout approach, which you almost certainly do.


High-Level Architecture

Five components carry the full system:

  1. Post Service handles post creation and writes to the Posts database
  2. Fanout Service determines which feed caches to update after each post
  3. Message Queue (Kafka or SQS) decouples fanout from the write path so a single post doesn't create latency spikes
  4. Feed Cache (Redis) stores pre-computed feed lists per user as sorted sets
  5. Feed Service handles the read path: fetches from cache, runs ranking, returns the response

Full architecture diagram: write path flows Client to API Gateway to Post Service, then branches to Posts DB (Cassandra) and Kafka Queue, then Fanout Workers to Redis Feed Cache. Read path flows Client to API Gateway to Feed Service to Redis Feed Cache to Ranking Service to Ranked Feed. The shared Redis cache connects both paths. Celebrity posts store sits off to the side, merged at read time.

Write path pre-computes. Read path consumes. They share only the cache.

State what you're optimizing for early. Reads outnumber writes roughly 100:1 on a social feed. Pre-compute aggressively on the write side. Accept slightly stale data. Every architectural choice flows from this.


Three Tables, Two Patterns

Three tables do most of the work.

posts ( post_id UUID PRIMARY KEY, author_id UUID NOT NULL, content TEXT, media_url TEXT, created_at TIMESTAMP, like_count INT DEFAULT 0, comment_count INT DEFAULT 0 ) follows ( follower_id UUID, followee_id UUID, created_at TIMESTAMP, PRIMARY KEY (follower_id, followee_id) )

For the feed cache, use Redis sorted sets. The score is a composite of recency and predicted engagement:

ZADD feed:{user_id} {composite_score} {post_id}
ZREVRANGE feed:{user_id} 0 19   # top 20 by score

Sorted sets give you O(log n) insert and O(1) range reads. That's the shape you want for a feed that updates constantly and gets read constantly.

Posts live in Cassandra or a sharded relational database. The write volume from fanout operations (a single post touching millions of caches) makes Cassandra's write-optimized log-structured merge tree attractive. The follows table can live in PostgreSQL with indexes on both columns.


Fanout on Write vs Fanout on Read: Push, Pull, or Hybrid

This is where most candidates get tripped up. There are three approaches. Two of them are wrong by themselves.

Fan-out on Write (Push): When a post is created, immediately write it to every follower's feed cache. Reading the feed is just a Redis lookup: O(1). The cost is O(followers) writes per post. For a normal user with 300 followers, fine. For a user with 10 million followers, a single post generates 10 million Redis writes, synchronously, on the hot path. Beyoncé posts and your system goes down. Not great.

Fan-out on Read (Pull): When a user opens their feed, query the recent posts of every account they follow in real time. No pre-computation, works fine for celebrities, data is always fresh. The cost is O(following_count) database reads per page load. A user following 1,000 accounts triggers 1,000 reads before you can start ranking. Your p99 climbs into second-tier embarrassment territory.

Hybrid: Push for regular users (fewer than 10K followers), pull for celebrities. At read time, merge the pre-computed Redis feed with real-time celebrity posts, then rank the combined set.

Hybrid fanout decision tree: a new post enters the Fanout Service and hits a decision diamond on follower count. Under 10K followers goes left: fan out to all follower caches in Redis. Over 10K (celebrity) goes right: store in Celebrity Posts Store without fanout. At read time, both branches merge into a single ranked response.

Push for regulars, pull for celebrities, merge at read time. The user sees one feed.

The 10K threshold isn't magic. It's approximately the inflection point where per-post fanout cost exceeds per-read fetch cost. Tune it based on your observed traffic.

The merge step at read time hides the seam entirely from the user. They see one ranked feed regardless of whether each post came from cache or a live query.


Ranking: From Score to Pipeline

Once you have candidates, you need to order them. There are three levels of sophistication, and you should describe at least Level 2 in an interview.

Level 1: Chronological. Sort by created_at. Simple, predictable, and what Twitter used before its algorithmic timeline. Completely valid if the interview scope is limited. Also the feed equivalent of assembling IKEA furniture with no tools: technically works, nobody's impressed.

Level 2: Engagement-weighted score with time decay. Compute a composite score for each post:

import math def score(post, user, now): age_hours = (now - post.created_at).total_seconds() / 3600 time_decay = math.exp(-0.1 * age_hours) # halves every ~7 hours engagement = post.likes * 1 + post.comments * 2 + post.shares * 3 affinity = interaction_frequency(user.id, post.author_id) return engagement * time_decay * affinity

Time decay with λ = 0.1 keeps the feed fresh without making it purely recency-based. A post with 10,000 likes ages out gracefully rather than disappearing the moment something newer arrives.

Level 3: Multi-stage ML pipeline. Facebook's production system runs three passes:

  • Pass 0 (filtering): A cheap model cuts 2,000 candidates to ~500. Removes already-seen posts, blocked users, muted topics. Speed matters more than precision here.
  • Pass 1 (neural scoring): A neural network predicts P(like), P(comment), P(share) for each of the 500 candidates. The final score is a weighted sum: 0.2 * P(like) + 0.5 * P(comment) + 1.0 * P(share). Shares get the highest weight because they're the strongest engagement signal.
  • Pass 2 (diversification): Rule-based pass prevents clustering. No more than three consecutive posts from the same author. Cap videos at 30% of the feed. Inject trending content. This is where business logic lives, separate from the ML model.

Three-pass ranking pipeline: ~2,000 candidates enter Pass 0 (cheap filter: seen posts, blocked users, muted topics). ~500 survive to Pass 1 (neural net scoring: P(like) x 0.2, P(comment) x 0.5, P(share) x 1.0, affinity). Ranked results enter Pass 2 (diversity rules: no 3 consecutive same author, videos under 30%, inject trending). 20 posts come out the other end.

Pass 0 exists because running a full neural network on 2,000 posts per request costs too much. Pass 2 exists because a pure ML score produces filter bubbles. Both are real constraints, not padding.

You get the most interview credit for explaining why each pass exists, not just listing them. Pass 0 exists because running the full neural network on 2,000 posts per request is too expensive. Pass 2 exists because a pure ML score produces homogeneous feeds. These are real engineering constraints, not arbitrary steps.


Caching: What to Pre-Compute

Pre-compute feeds for active users only. Users inactive for 30 or more days get no pre-computed feed. Compute it lazily when they log back in.

Each user's feed cache holds the top 500 scored post IDs, not the full post content. The Feed Service fetches 500 IDs, re-ranks in-process (fast, no DB round-trip needed), and returns the top 20. This pattern absorbs 90%+ of read traffic with zero database access.

For viral posts, a single post ID lands in millions of users' feed caches. This creates a hot key problem. Replicate the post content across multiple Redis nodes rather than trying to shard by post ID. Each node serves a subset of users, no coordination needed.


Tradeoffs Worth Naming Explicitly

Naming tradeoffs explicitly is what separates a senior answer from a junior one. Cover these:

DecisionWhat to pickWhy
FanoutHybrid push/pullPush for regulars, pull for celebrities, merge at read time
RankingLevel 2+ composite scoreChronological doesn't retain users at scale
PaginationCursor-based, not offsetOffset breaks as the feed updates under pagination
Feed storageRedis sorted setO(log n) write, O(1) range read
Post storageCassandra or DynamoDBWrite-heavy fanout pattern

Cursor-based pagination deserves a specific mention. Offset pagination returns page 2 of "posts sorted by score" but the scores change constantly. A cursor encodes your position relative to a specific post, so the page boundary stays stable even as new posts arrive. Offset pagination on a live-ranked feed is like bookmarking a river.


How to Spend the 45 Minutes

Interviewers notice when you spend 30 minutes on database schemas and forget to mention ranking. Here's the clock:

  • 0 to 5 min: Requirements clarification. Ask about ranking vs chronological, max follower count, content types.
  • 5 to 10 min: Sketch the two-pipeline mental model. Label write path and read path separately before adding any boxes.
  • 10 to 20 min: Data model and APIs. Three tables, three endpoints (POST /posts, POST /users/:id/follow, GET /feed?cursor=).
  • 20 to 30 min: Fanout deep dive. Explain push vs pull, land on hybrid, name the threshold, describe the read-time merge.
  • 30 to 40 min: Ranking deep dive. Walk through the scoring formula or the three-pass ML pipeline.
  • 40 to 45 min: Scaling and tradeoffs. Celebrity problem, hot keys, cursor pagination.

The three signals interviewers listen for: you identify the celebrity problem without being prompted, you explain why hybrid fanout is better than either pure approach, and you describe ranking as a multi-stage pipeline rather than a single sort. Get those three right and you are in strong hire territory at most companies.

Articulating this under time pressure, with someone pushing back on every choice, is much harder than it reads on paper. If you want to practice the full walkthrough with live questioning, SpaceComplexity runs voice-based mock system design sessions with rubric-based feedback. Most candidates discover they can sketch the fanout correctly but fumble when asked to defend their ranking formula.


Recap

  • Two pipelines: write-path fanout and read-path ranking are separate subsystems with separate tradeoffs
  • Clarify ranked vs chronological and ask about max follower count before designing anything
  • Hybrid fanout: push for users with fewer than 10K followers, pull for celebrities, merge at read time
  • Ranking: engagement score with time decay at minimum; three-pass ML pipeline for production
  • Pre-compute feeds for active users only; top 500 IDs in Redis sorted sets per user
  • Cursor-based pagination keeps page boundaries stable as scores update
  • Name your tradeoffs explicitly; that's what separates a senior answer

Further Reading