How to Design Pastebin in a System Design Interview

You get the system design prompt: "Design Pastebin." It sounds trivial. A box where you paste text, click submit, get a URL back. Done in ten minutes if you're generous with the box labeled "database."

Wrong. What your interviewer is actually testing is whether you can take a deceptively simple product and find the real engineering challenges underneath it. Pastebin is a read-heavy, write-once system with a URL generation problem, an expiration problem, and a storage split problem, all wrapped in a 45-minute time box. It's a better signal than most.

Full walkthrough below: requirements, data model, API, URL generation, caching, cleanup, tradeoffs, and how to pace yourself through it.

What Pastebin System Design Is Actually Testing

Pastebin is not a storage problem disguised as an interview question. The core challenge is designing a read-heavy system where the write path is trivial but the read path must scale to 10x the write load, globally, with sub-100ms latency.

A paste is written once. It may be read by a developer debugging in Tokyo, a student in Berlin, and a recruiter in Austin, all in the same hour. The write path almost doesn't matter. The read path is everything.

Secondary tests: how you generate collision-free short URLs at scale, how you handle expiration without taking down the database with it, and whether you can articulate the tradeoff between storing content in a database versus object storage.

Start Here: Clarify Before You Design

Spend five minutes on this. The answers change your architecture.

Functional requirements (confirm these):

Create a paste, get a short URL back
Anyone with the URL can read the paste
Pastes expire after a configurable time (default 24 hours, options up to "never")
Optional: user accounts, paste visibility (public/private), syntax highlighting

Non-functional requirements (propose these, get agreement):

Reads should be fast. Sub-100ms p99 in the user's region.
High availability. Paste creation can tolerate brief degradation; reads cannot.
URLs must not be guessable or sequential. No enumeration attacks.
Durability: a created paste must not disappear before its expiration time.

Scope to skip (say this out loud): analytics, search, user feeds, comments, syntax highlighting rendering. You're designing the core paste storage and retrieval service. This is the part where you say it out loud and someone asks about search anyway. Say no again, politely.

The Numbers (Back of Envelope)

Your interviewer wants to see you quantify before you design. This takes two minutes and anchors every decision you make afterward.

Assume: 1 million new pastes per day, read-to-write ratio of 10:1.

Writes:  1M / 86,400s ≈ 12 writes/sec
Reads:   10M / 86,400s ≈ 115 reads/sec

Average paste size: 10 KB (code snippets, logs, config files).

Storage per day:  1M × 10 KB = 10 GB/day
Storage per year: ~3.65 TB
Storage per 3yrs: ~11 TB

Bandwidth:

Write ingress: 12 × 10 KB ≈ 120 KB/s  (trivial)
Read egress:   115 × 10 KB ≈ 1.15 MB/s (very manageable)

These are modest numbers. Practically boring. The system does not need exotic sharding. What it needs is a smart read path, because even 115 reads/second concentrated on a single database server with no caching would be wasteful when most of those reads are for the same 20% of hot pastes.

State your conclusion: "This is a read-heavy but not high-throughput system. Our design should optimize for read latency via caching, not for write throughput via sharding."

Back-of-envelope traffic estimates for Pastebin: 12 writes/sec, 115 reads/sec, 10 GB/day storage

Back-of-envelope: the write load is laughably small. The read path is where the interesting decisions live.

The Four-Layer Architecture

The system has four layers.

Write path: Client posts content to an API server. The API server asks a Key Generation Service (KGS) for an unused ID, writes the paste record to the database and content to object storage, then returns the short URL to the client.

Read path: Client hits the short URL. The API server checks an in-memory cache (Redis). Cache hit returns content immediately. Cache miss fetches from object storage (or database for small pastes), populates the cache, returns the content.

Expiration: A background cleanup worker runs on a schedule, batching soft-deleted or expired pastes and removing them from storage and cache.

Supporting services: A load balancer in front of the API tier. Read replicas for the metadata database. A CDN for publicly accessible paste content.

Pastebin system architecture: client, load balancer, three API servers, KGS, Redis cache, PostgreSQL with read replicas, S3 object storage, CDN, and a background cleanup worker

Four layers: edge caching, API tier, data stores, background worker. Each one solves a different problem so you are not making one box do everything.

Data Model: Two Stores, Not One

The naive approach is to shove everything into one database. That works at small scale. At any real scale, you want to split it.

Metadata store (relational or document DB):

CREATE TABLE pastes (
  paste_id    CHAR(8)      PRIMARY KEY,
  user_id     BIGINT       NULL,
  created_at  TIMESTAMP    NOT NULL,
  expires_at  TIMESTAMP    NULL,
  size_bytes  INT          NOT NULL,
  content_key VARCHAR(255) NOT NULL  -- S3 object key
);

paste_id is the short identifier (e.g., aB3kZ9mX). content_key points to object storage. The metadata table is small per row, queryable, and indexable by expiration time for cleanup.

Content store (object storage, e.g., S3):

Each paste's raw text lives at a key like pastes/aB3kZ9mX. Object storage is cheap, durable, and trivially distributable. For small pastes (under 1 KB), you can inline content in the metadata table to save a round trip, but treat this as an optimization, not the default.

Why not store everything in the database? A 10 TB content database is expensive to host, slow to replicate, and painful to back up. Object storage handles durability with built-in replication and costs a fraction of the price. Separating metadata from content is the right call for any large-scale store of user-generated blobs.

Pastebin two-store data model: SQL pastes table with content_key pointing to S3 object storage paths

Metadata stays in Postgres (fast, queryable, indexed by expires_at). Content goes to S3 (cheap, durable, scales forever).

Three Endpoints. No More.

Keep it narrow.

POST /pastes
Body: { content, expires_in, visibility }
Returns: { paste_id, short_url, expires_at }

GET /pastes/{paste_id}
Returns: { content, created_at, expires_at }

DELETE /pastes/{paste_id}
Auth required. Returns: 204 No Content

The read endpoint is the hot path. It should return content directly, not a redirect to object storage, so your caching layer can intercept it.

Expose the S3 URL directly and you bypass your expiration logic entirely. A client can cache the presigned URL and read an expired paste. Always route reads through your API tier so you can enforce expiration checks at read time.

The Hard Part: Generating Unique URLs

This is the design question inside the design question. You need short, URL-safe, collision-free identifiers.

Option 1: Hash on write. Take the paste content, run MD5 or SHA-256, take the first 8 characters. Fast and dependency-free.

Problems: identical content yields identical IDs (information leak). You also need to handle collisions on truncated hashes, which means a read-before-write check on every creation. Under load, that check becomes a bottleneck. The hash table time complexity deep-dive has the collision math if you want the numbers.

Option 2: Random base62 ID. Generate 8 random characters from [A-Za-z0-9] (62 characters), giving 62^8 ≈ 218 trillion possible IDs. Check for collision before inserting.

Still has the read-before-write problem. With 11 TB of pastes over 3 years, birthday collision probability is negligible, but you are still making a DB round-trip to check.

Option 3: Key Generation Service (KGS). A dedicated service pre-generates random 8-character base62 strings and stores them in two tables: keys_available and keys_used. On paste creation, the API server asks the KGS for one key. The KGS moves it from available to used and returns it. No collision check at write time.

The KGS is the right answer for most interviews. It decouples key generation from write latency, guarantees uniqueness without a real-time DB check, and is easy to make highly available by running two KGS instances that each hold a small in-memory buffer of pre-fetched keys.

KGS workflow: keys_available table feeds the KGS service which issues unique paste IDs to the API server and moves them to keys_used table

The KGS does the uniqueness work upfront. No collision check at write time. Your API server just asks "give me a key" and moves on.

Expect this follow-up: "What if the KGS crashes before it moves keys to the used table?" Answer: the keys in the in-memory buffer are lost. Gone. A moment of silence for those 8-character strings. You might reissue one of them. For a Pastebin, a rare duplicate URL returns a conflict error and the client retries. For a URL shortener handling payments, you'd want stricter guarantees.

Scaling the Read Path

At 115 reads/sec this system is not under any stress. Your load balancer is basically on a beach vacation right now. But the interviewer wants to see that you know how to scale it.

Layer 1: Redis cache. Cache paste content keyed on paste_id. TTL matches the paste's expires_at. Cache hit rate should be high because the top 20% of pastes generate roughly 80% of reads (standard long-tail behavior). Use a cache-aside pattern: miss hits object storage, populates cache, returns content.

Layer 2: CDN. For public pastes, the GET response is cacheable. Route reads through a CDN (CloudFront, Fastly). This pushes content to edge nodes globally, dropping latency for international reads without any changes to your backend. A CDN alone can absorb the majority of read traffic for a public paste service.

Layer 3: Read replicas. Your metadata DB receives a read on every cache miss (to check expiration and fetch the content key). Add one or two read replicas. Direct read queries to replicas, writes to primary. Replication lag is acceptable here: if a replica is a few hundred milliseconds behind, the worst case is a slightly stale expiration check, which is not a critical consistency requirement.

Pastebin read path: client to CDN edge cache, miss goes to API server, Redis cache, then S3 object storage with PostgreSQL read replicas for expiration checks

Three layers of caching. Most requests never leave the CDN. The ones that do die at Redis. S3 is the last resort.

How to Handle Expiration Without Blowing Up Your Database

Two strategies, and both belong in your answer.

Lazy expiration: when a read request arrives, check expires_at in the metadata DB. If the paste is expired, return 404 and optionally queue it for deletion. This avoids any background work and is correct by definition. The paste never serves content past its expiration, even if the record still exists.

Background cleanup worker: a scheduled job (cron-like, runs hourly or nightly) queries SELECT paste_id FROM pastes WHERE expires_at < NOW(), soft-deletes in batches, then removes from S3 and evicts from Redis. Run this against a read replica during off-peak hours. Lazy expiration handles correctness; the background worker handles storage reclamation.

Do not try to delete at exact expiration time for every paste. That requires a priority queue of future deletions, which adds complexity without meaningful benefit. You are building a paste service, not an alarm clock.

The Tradeoffs Your Interviewer Wants to Hear

Metadata in SQL vs. NoSQL: SQL gives you easy range queries for cleanup (expires_at < NOW()), transactions, and familiar tooling. NoSQL gives you horizontal scalability. For Pastebin's write volume (12 writes/sec), a single well-indexed PostgreSQL instance handles it comfortably. Choose SQL, acknowledge the NoSQL option, and say why you are not taking it.

Inline content vs. object storage: Inlining content in the DB saves one round trip per cache miss. At 10 KB average paste size, your DB grows at 10 GB/day, which is manageable but expensive compared to S3 at $0.023/GB. For anything over 1-2 KB, object storage wins on cost.

KGS single point of failure: The KGS is stateful. Run two instances with their own key ranges, or use a distributed ID generator (Snowflake-style timestamps). For an interview, "run two instances with overlapping pre-fetch buffers and coordinate via a short-lived lock" is enough.

Rate limiting: Without it, a single client can create 10 million pastes and exhaust your key space or storage quota. Add a token bucket rate limiter at the API gateway, keyed on IP or user ID. Reasonable default: 100 pastes per hour per IP. That client exists. They are always out there. They will find your endpoint within the first hour of launch.

How to Use Your 45 Minutes

Most candidates front-load the data model and run out of time before covering bottlenecks and tradeoffs. That's backwards. It's the same instinct that makes engineers spend 45 minutes on folder structure before writing a single function.

A workable pacing:

Minutes 0-5: Requirements and capacity estimation. Do not skip this. It earns you credibility and establishes the constraints that justify every decision afterward.
Minutes 5-15: High-level architecture and data model. Whiteboard the four layers, draw the two stores.
Minutes 15-25: URL generation deep dive. This is the sharpest signal question in this problem.
Minutes 25-40: Caching, expiration, scaling. Talk through each layer.
Minutes 40-45: Tradeoffs and follow-ups. Let the interviewer drive.

The single biggest mistake is treating Pastebin as a simple CRUD app and spending all your time on the schema. It's a read scaling problem with a key generation puzzle in the middle. Lead with those.

Narrate your uncertainty when you hit it. "I'd want to understand the expected read-to-write ratio before committing to a caching strategy" is better than barreling forward with assumptions. Interviewers score communication. Saying you're making an assumption and why is a positive signal, not a hedge.

Take This Into the Room

Pastebin is read-heavy. Optimize the read path first.
Split metadata (SQL) from content (object storage). Do not cram text into the DB.
Use a Key Generation Service for unique IDs. Hash-on-write has collision and information-leak problems.
Base62, 8 characters = 218 trillion IDs. You will not run out.
Layer your caching: Redis for hot pastes, CDN for public content, read replicas for metadata.
Lazy expiration for correctness, background worker for storage cleanup.
Rate limit at the API gateway or you will be abused immediately.
In the interview: requirements first, URL generation is your deep-dive moment, tradeoffs close the loop.

Practicing system design verbally is a different skill from knowing the material. If you want to see what your explanations sound like under real interview pressure, SpaceComplexity runs voice-based mock interviews with structured feedback on exactly this kind of problem.