Read Replicas in System Design Interviews: One Problem, Not a General Fix

- Read replicas scale reads, not writes: adding replicas multiplies read throughput but does nothing when the primary's write capacity is saturated
- Async replication is the default tradeoff: replicas can lag by milliseconds to seconds, so consistency-sensitive reads must be routed to the primary
- Read-your-writes and in-transaction reads go to the primary: stale replicas break post-write confirmation pages and any read inside a write transaction
- Connection pooling (PgBouncer/ProxySQL) reduces write-path overhead before sharding and belongs explicitly in your answer
- Sharding is the write scaling answer: horizontal partitioning solves write saturation but breaks cross-shard queries and foreign keys
- Naming the replication lag tradeoff explicitly is what generates strong hire signals, not just proposing replicas
You have a database. It's getting hammered. Your first instinct is to throw more hardware at it, which buys you time but not architecture. The real question, the one that actually matters: are you bottlenecked on reads, writes, or both? The answer changes everything about what you should propose next.
Most production systems are read-heavy. For every write to a social feed, there might be thousands of reads. Read replicas exploit that asymmetry. But they solve exactly one problem, and system design interviewers notice when candidates treat them as a general cure for whatever ails the database.
The Problem Read Replicas Actually Solve
A single database node has limits. It can process only so many queries per second before response times climb. Vertical scaling (faster CPU, more RAM, faster disks) has a ceiling, and it gets expensive fast. At some point you're pricing custom server hardware and your manager is giving you a look.
Read replicas are additional database nodes that hold a copy of your data. Reads go to any replica; writes go to the primary. Five replicas roughly quintuples read throughput. The primary stops competing with every product page load for the same disk I/O, CPU, and lock budget.
Most traffic is reads. A typical e-commerce product page triggers dozens of read queries (product detail, reviews, recommendations, inventory) for every single write (an order). All of those reads were fighting over the same machine. Read replicas end that brawl.
AWS RDS supports up to 15 read replicas per primary for MySQL and PostgreSQL. Fifteen. You'll probably never need fifteen, but it's nice to know they're there, like a fire extinguisher you hope to never actually need.
How Replication Actually Works
The primary records every change to a log: Postgres uses a Write-Ahead Log (WAL), MySQL uses a binary log. Replicas subscribe to that stream and apply changes in the same order they happened on the primary. It's like a very boring podcast your replicas are always subscribed to.
Two modes matter for interviews.
Asynchronous replication is the default. The primary commits a write and acknowledges the client without waiting for replicas to confirm. Replicas catch up in the background, typically sub-millisecond under normal load but potentially seconds when write volume spikes. Fast for writers, but it introduces replication lag.
Synchronous replication makes the primary wait for at least one replica to confirm before acknowledging the client. Every write slows by the roundtrip to the replica, but if the primary dies instantly, a replica has no data loss. MySQL supports semi-synchronous replication via a plugin. Postgres supports synchronous standbys natively.
Most production setups mix modes: one synchronous replica for durability, the rest asynchronous for read scale. Replication lag is the core tradeoff to understand and communicate clearly.

Whose Reads Go Where?
Not all reads can safely go to replicas. This is the part candidates usually skip, and it's where interviewers quietly make up their minds about you.
Read-your-writes consistency. A user submits a form and lands on a confirmation page. If that read hits a replica that hasn't caught up, they see nothing. The user assumes the form didn't submit. They click again. Now you have duplicate orders and a support ticket. Route post-write reads for the same user to the primary, or add a short delay before hitting the replica.
Monotonic reads. A user refreshes twice and sees the new value, then the old value. Going backwards in time tends to erode user trust. Sticky sessions (always routing a user to the same replica) or routing to the primary after any write prevents this.
In-transaction reads. Any read inside a write transaction must go to the primary. The replica doesn't have the uncommitted write.
Latency-tolerant reads go to replicas; consistency-sensitive reads go to the primary. Activity feeds, product listings, aggregate stats, and analytics queries can tolerate a few seconds of lag. Payment status, post-write confirmation pages, and inventory counts after a reservation usually can't.
Name specific examples from the system you're designing. "Reviews and search results go to replicas. Checkout inventory checks go to primary." That specificity separates a good answer from a great one.
When Read Replicas Stop Helping
You added replicas, read throughput improved, and you celebrated. Then the write queue started climbing and you realized you'd solved the wrong problem. The primary, alone and handling every write for your entire dataset, has become the new bottleneck.
When write volume climbs, you see write queue depth grow, replication lag increase, lock contention on hot rows, and primary CPU maxing out. Adding more read replicas does nothing for any of this. You cannot read your way out of a write problem.
Three options, in the order you should present them:
Vertical scaling. Bigger machine, more RAM, faster NVMe disks. The right first step because it's operationally simple. Works until it doesn't.
Connection pooling. Tools like PgBouncer (Postgres) or ProxySQL (MySQL) sit in front of your database and multiplex thousands of application connections into a small pool of real database connections. PgBouncer in transaction mode handles thousands of app connections with 20-50 real connections. It doesn't increase raw throughput, but it keeps connection overhead from amplifying write pressure. Low cost, worth naming explicitly.
Sharding (horizontal partitioning). Split the dataset across multiple primaries. Each shard owns a subset and accepts writes only for that subset. A users table might partition by user ID hash, sending IDs 0-999 to shard A, 1000-1999 to shard B. Each shard scales and replicates independently.
Sharding is the real answer to write scaling. Cross-shard queries become expensive joins across network boundaries, foreign key constraints break, and multi-shard transactions require distributed coordination. Sharding is what interviewers eventually want to hear, but it should come after you've explained why simpler options fall short. Jumping straight to sharding signals you read a blog post, not that you understand the tradeoffs.
For a deeper look at sharding strategies, the database sharding guide covers consistent hashing, range partitioning, and hotspot problems in detail.
The Diagrams to Draw
Draw read/write separation first. Show the primary receiving writes and the load balancer routing reads to replicas. Then, only after the interviewer asks about write scale, expand to sharding. Presenting both upfront without a prompt looks like you rehearsed a script.
App Servers
/ \
Writes Reads
| |
v v
[Primary DB] [Load Balancer]
| / | \
+------>[Replica][Replica][Replica]
(async replication)
When write scaling comes up:
[Router / Proxy]
/ | \
[Shard A] [Shard B] [Shard C]
+reps +reps +reps
Showing both makes clear that read scaling and write scaling are separate problems with different solutions. Many candidates conflate them and propose sharding when someone just asks "how do you handle more traffic?"
What to Say in a Read Replicas System Design Interview
Reason through the traffic before proposing anything. Candidates who jump straight to a solution are guessing. Interviewers can tell.
- Estimate the read/write ratio. "This looks read-heavy, maybe 95/5. Product pages, feeds, and searches are all reads. Orders and user actions are writes."
- Propose replicas and separate the paths. "Read replicas handle the high-volume latency-tolerant reads. The primary handles all writes."
- Name the replication lag tradeoff explicitly. "Async replication means replicas might be a few hundred milliseconds behind. For these flows, we route to the primary. For everything else, replicas are fine."
- Address write scaling separately. "If write volume eventually saturates the primary, we can shard by user ID. The tradeoff is cross-shard queries get more complex."
- Talk about failure modes. "If a replica falls behind its lag threshold, the router stops sending it reads. We need to define our freshness SLA upfront."
Naming the tradeoff clearly is more impressive than naming the pattern. "Read replicas reduce read load" is table stakes. "Async replication introduces lag, so we need an explicit policy for consistency-sensitive reads" is what generates strong hire signals.
For replication strategies including leader election and failover, the database replication guide covers what happens when the primary fails.
Common Mistakes
Proposing replicas before identifying the bottleneck. If writes are the bottleneck, replicas don't help at all. Ask which operation is slow, first.
Ignoring replication lag entirely. Every interviewer who knows databases will probe this. Have a concrete policy, not just "it's fine." Because "it's fine" is exactly what engineers say right before it's not fine.
Treating sharding as the first solution. Sharding is operationally expensive. Most interviewers want to see you exhaust simpler options first: caching, read replicas, vertical scaling. Jumping to sharding in the first two minutes reads as memorization, not engineering judgment.
Conflating replicas with backups. Replicas are live read-serving nodes. Backups are point-in-time snapshots. If you delete a row on the primary, it replicates to replicas immediately. Replicas don't save you from your own mistakes. The CAP theorem guide covers why availability guarantees from replicas aren't the same as durability guarantees from backups.
Not knowing when to use a replica vs a cache. A cache is faster but adds invalidation complexity. A replica stays current automatically and supports arbitrary SQL. For expensive aggregations and full-text search, a dedicated replica often beats a cache. For hot static data, a cache wins. The caching strategies guide covers when each layer makes sense.
If you want to practice talking through database scaling under live interview pressure, SpaceComplexity runs voice-based mock system design interviews with rubric-based feedback. Explaining tradeoffs out loud before you sit down with a real interviewer makes a measurable difference.
Further Reading
- AWS RDS Read Replicas: official documentation on replica limits, promotion, and cross-region options
- PostgreSQL Streaming Replication: how WAL-based replication works at the protocol level
- MySQL Replication: binary log replication, semi-sync options, and GTID-based replication
- Designing Data-Intensive Applications, Chapter 5: Martin Kleppmann's canonical treatment of replication lag, consistency models, and failure scenarios
- PgBouncer documentation: how connection pooling reduces write-path overhead for Postgres