Instagram System Design Interview: The 45-Minute Playbook

"Design Instagram" sounds friendly until you realize you have 45 minutes to cover a system that serves 2 billion users, stores hundreds of petabytes of media, and generates a personalized feed for 500 million people every single day. Most candidates either try to boil the ocean or build a monolith that would fall over at 10,000 users.

The winning move in any Instagram system design interview is knowing which three components your interviewer actually cares about and driving the conversation there before time runs out. That's what this walkthrough does.

Nail the Requirements Before You Touch a Diagram

Don't immediately draw boxes. Spend 5 minutes forcing the interviewer to be specific. The blank whiteboard will still be there when you're done.

Ask these:

Are we building a read-heavy social feed (Instagram) or a write-heavy media platform (TikTok-style)?
Should the feed be real-time (Twitter) or tolerate a few seconds of staleness (Facebook)?
Are we scoping in Stories, Reels, DMs, or just the core photo feed?
Is geographic distribution required (CDN), or are we US-only?

For this walkthrough, we commit to the core product:

Functional requirements:

Users can upload photos and short videos
Users can follow and unfollow other users
Users see a personalized, reverse-chronological home feed
Users can like and comment on posts
Users can view any public profile and its posts

Non-functional requirements:

Highly available: feed loads even when some services are degraded
Feed reads under 200ms p99
Photo uploads eventually consistent; new posts can take a few seconds to appear
500 million daily active users

Scoping is your first interview signal. Candidates who ask nothing and immediately design everything usually under-deliver on every component.

The Numbers That Shape Your Architecture

You cannot pick databases and caching strategies without running the numbers first. Interviewers reward candidates who derive their choices from real estimates, not instinct. Show your math.

Assume 2B MAU, 500M DAU. Each active user views their feed 10 times a day and uploads 1 post every 10 days.

Write QPS (photo uploads): 500M users / 10 days per post / 86,400 seconds = ~580 writes/second. Call it 1,000/s with headroom.

Read QPS (feed loads): 500M DAU × 10 feed refreshes / 86,400s = ~58,000 reads/second. Feed reads dwarf writes by 60×.

Storage: ~50M photo uploads/day × 2 MB average = 100 TB/day. After processing multiple resolutions (thumbnail, medium, full), call it 300 TB/day. In a year: ~100 PB. This forces object storage plus a CDN. No file system survives this.

What Goes Where: High-Level Architecture

High-level Instagram system architecture showing client, API gateway, five microservices, Kafka event bus, and three data layers

Five independently scalable services, one message bus, one clear read/write separation.

Five core services, each independently scalable:

User Service handles authentication, profiles, and the social graph (follows/followers)
Post Service handles media upload orchestration and post metadata
Feed Service generates and serves the home feed
Media Processing Service transcodes videos and creates multiple image resolutions
Notification Service delivers push notifications asynchronously

All five sit behind an API gateway that handles rate limiting, auth token validation, and routing. A Kafka message bus connects them: the Post Service publishes events; Feed Service and Notification Service consume them.

Media never goes through your application servers. The client gets a presigned S3 URL from the Post Service, uploads directly to object storage, and the CDN handles distribution. Your servers only move a short URL string, not gigabytes of pixels.

Your Data Model Is Your Architecture

Get the data model wrong and every scaling decision downstream is fighting the schema.

Users (PostgreSQL, sharded by user_id):

user_id      UUID PRIMARY KEY
username     VARCHAR(30) UNIQUE
email        VARCHAR(255) UNIQUE
password_hash VARCHAR(60)
bio          TEXT
profile_pic_url TEXT
created_at   TIMESTAMP

Posts (PostgreSQL, same shard as user):

post_id      UUID PRIMARY KEY
user_id      UUID
media_url    TEXT
caption      TEXT
created_at   TIMESTAMP
status       ENUM('published', 'deleted')

Follows (PostgreSQL, indexed both ways):

follower_id  UUID
followee_id  UUID
created_at   TIMESTAMP
PRIMARY KEY (follower_id, followee_id)

Likes and Comments (Cassandra):

-- Likes partitioned by post_id, clustered by created_at
-- Comments partitioned by post_id, sorted by created_at DESC

Likes and comments are high-volume, append-heavy, and never need relational joins. Cassandra's wide-column model handles 100,000 writes/second without breaking a sweat. PostgreSQL's MVCC would struggle with that write amplification on a hot post during peak hours.

Data layer breakdown: PostgreSQL for relational data, Cassandra for engagement data, Redis for feed hot path

The decision rule: need joins or ACID, use PostgreSQL. Need 100k writes/s and no joins, use Cassandra. Need it in 1ms, use Redis.

Feed Generation Is the Hard Part

This is where interviews are won or lost. Feed generation has three possible architectures, and the right answer for Instagram is all three at once.

Fan-out on Write (Push Model)

When a user posts, immediately write the post ID to every follower's feed table. When a follower opens the app, their feed is already pre-computed.

User A posts photo
  → Kafka: "post.created" {post_id, user_id}
  → Feed Service reads A's 200 followers
  → Writes post_id to each follower's Redis feed list

Reads are instant. Redis returns a sorted list of post IDs in O(1). The tradeoff: 200 followers is fine. 50 million (think: a Kylie Jenner post) is catastrophic. At 1,000 writes/second, fanning out to 50M followers takes 13 hours. TMZ will have the story first.

Fan-out on Read (Pull Model)

When a user opens the app, query all followees' recent posts and merge them in real time.

def get_feed(user_id):
    followees = get_following(user_id)  # could be 500 accounts
    posts = [get_recent_posts(uid, limit=20) for uid in followees]
    return sorted(chain(*posts), key=lambda p: p.created_at, reverse=True)[:20]

Writes are cheap. Reads are brutally expensive. Merging 500 accounts' posts on every feed load, at 58,000 requests/second, melts your database cluster.

The Hybrid: What Instagram Actually Does

Push for normal users, pull for celebrities. Define a follower-count threshold (Instagram reportedly uses around 1 million). Below the threshold: fan-out on write. Above it: skip the fanout.

At read time, the Feed Service fetches the user's pre-computed feed from Redis and stitches in any posts from high-follower accounts they follow by pulling those directly from the Post Service. The merge happens in the Feed Service in memory before returning.

Feed read for User B:
  1. Load pre-computed feed from Redis (post IDs from normal followees)
  2. Check if B follows any celebrities (cached flag)
  3. If yes, pull recent posts from Celebrity Service
  4. Merge + deduplicate + rank
  5. Return top 20 post IDs → client fetches metadata

This is the answer the interviewer is waiting for. Name it, explain why, and move on.

Three feed generation approaches side by side: push model for normal users, pull model, and the hybrid Instagram actually uses

Instagram's hybrid routes on follower count: push below ~1M followers, pull above it, merge at read time.

Media Upload: Never Touch the Data Yourself

The upload path matters more than most candidates realize. Show the steps.

Sequence diagram for presigned URL upload flow: client requests URL, uploads directly to S3, S3 fires a Kafka event, media processor creates multiple resolutions, CDN gets updated

App servers coordinate the handoff, then get out of the way.

Client requests a presigned upload URL: POST /posts/upload-url
Post Service generates a presigned S3 URL (valid for 60 seconds), stores a "pending" post record, returns the URL
Client uploads directly to S3. The app server never sees the bytes.
S3 triggers an event to Kafka on upload completion
Media Processor picks up the event, creates multiple resolutions (150px thumbnail, 640px, 1080px), re-uploads to S3, updates the CDN origin
Post Service marks the post "published"

Decoupling storage from compute makes uploads horizontally scalable and keeps your API servers lean. Every megabyte you route through an application server is a megabyte that server can't spend on anything else.

Caching: Three Layers, One Rule

The rule: anything read more than once should come from cache, not database.

There are two hard problems in computer science: cache invalidation, naming things, and off-by-one errors. You just need to show you know where caching belongs.

Redis (L1 cache): Feed = sorted set of post IDs per user (capped at 200). Profiles = hash, TTL 5 minutes. Post metadata = TTL 1 minute. Session tokens = TTL equals JWT expiry.

CDN: Cache-Control set to 1 year for immutable media. Cache hit rate target above 99%. Every miss is a round trip to S3.

Application-level:

The social graph is read on every feed load. Cache followee lists in Redis, TTL 10 minutes. A stale follow list for 10 minutes is acceptable; a slow feed every page load is not.

Name the Tradeoffs Out Loud

Interviewers give credit for naming tradeoffs explicitly. Most candidates say "I'd use Cassandra here." The better version: "I'd use Cassandra because we need high write throughput for likes, don't need joins, and eventual consistency is fine since a like count off by one for 500ms doesn't matter."

Four to name in an Instagram interview:

1. Consistency vs availability for engagement data. Likes and comment counts can be eventually consistent. User authentication and post creation should be strongly consistent. Different services can use different consistency guarantees.

2. Pre-computed feed vs real-time ranking. Pre-computing is fast but stale. Real-time ranking is fresh but expensive. Instagram uses pre-computed feeds with an ML ranking pass on the final 50 candidates before serving.

3. SQL vs NoSQL. PostgreSQL for users, posts, and the social graph: relational queries, foreign key integrity, ACID. Cassandra for likes and comments: horizontal write scale, no joins needed.

4. Monolith vs microservices. Start this conversation with "Instagram started as a Django monolith with 13 engineers." Microservices complexity is justified by scale, not by preference. Show you understand the cost.

How to Run the Instagram System Design Interview Clock

Most candidates run out of time because they over-engineer the first component. Go deep on feed generation, not the API gateway.

0-5 min: Requirements. Force scope. Draw nothing yet.
5-10 min: Scale estimation. Derive write QPS, read QPS, storage. Connect numbers to choices.
10-20 min: High-level design. API gateway, services, databases, CDN. One box per service.
20-40 min: Deep dive on feed generation. This is where the interview lives.
40-45 min: Bottlenecks and tradeoffs. Name what breaks first.

When you go deep on feed generation, follow the pattern: naive solution, problem with naive, improved solution, tradeoff of improvement. Don't skip to the answer. Show the reasoning.

Seven Things to Carry In

Scope to core features. Stories and Reels are separate conversations.
500M DAU creates 60× more reads than writes. Design the read path first.
Media goes directly to object storage. Your app servers move URLs, not bytes.
Fan-out on write for normal users, pull for celebrities, merge at read time.
PostgreSQL for relational data, Cassandra for engagement writes, Redis for the feed hot path.
Explicitly name the consistency tradeoffs. Eventual is fine for likes, not for auth.

If you want to practice articulating this out loud, not just reading it, SpaceComplexity runs voice-based system design mock interviews with rubric-based feedback across communication, depth, and tradeoff reasoning. Designing Instagram at 2 billion users reads clearly on paper. Explaining it cold under interview pressure is a different skill.

For more on the interview execution side, see how interviewers score tradeoff communication and why what you say while you think matters as much as the final design.