Design a Multiplayer Game Backend: The Interview Walkthrough

Most system design questions ask you to handle requests. A multiplayer game backend asks you to keep state synchronized across 64 players, 128 times per second, while half of them have 80ms pings and one of them is definitely on hotel Wi-Fi.

The failure mode in interviews is treating this like a chat app with weapons. Chat tolerates a 200ms delay. A bullet registration system does not.

The mental model that carries the whole interview: a game backend is a real-time state machine, not a request-response service. Everything else follows from that.

Set the Clock Before You Draw Anything

The first five minutes are scope, not architecture. Ask these:

Turn-based or real-time? Checkers tolerates HTTP polling. Fortnite requires UDP at 64Hz minimum.
Player count per session? A 2-player chess game and a 100-player battle royale are completely different scaling problems.
Persistent world or match-based? An MMO has one shared world. A round-based shooter creates and destroys server instances per match.
Competitive or casual? Competitive games demand 128Hz tick rates and server-authoritative hit detection. Casual games get away with looser constraints.

For the rest of this walkthrough, assume a competitive battle royale: 64 players per match, real-time, match-based, global player base.

The 45-minute clock roughly splits as: 5 minutes on requirements, 10 on high-level architecture, 10 on game server and networking, 10 on data model and persistence, 5 on scaling, 5 on tradeoffs. Treat that as a pacing guide, not a script.

Five Boxes on the Whiteboard

Draw these five components and you have the skeleton:

Matchmaking service (stateless): queues players, runs the skill-matching algorithm, allocates a game server instance for each match.
Game servers (stateful): one instance per active match. Runs physics, hit detection, and the simulation loop. Owns all live state.
Service layer (stateless): auth, player profiles, inventory, leaderboard, payments. Standard microservices, scale horizontally.
Data layer: Redis for hot session state, PostgreSQL or DynamoDB for authoritative player data.
Gateway: routes client connections to the correct game server instance. Handles WebSocket or UDP connections.

The stateful/stateless split is the most important architectural decision in this design. Your service layer scales horizontally with zero ceremony. Your game servers cannot. A running match cannot be transparently restarted, because 64 players are connected and their game state lives in memory. You cannot just spin up a new box and call it done. The match is the process.

Multiplayer game backend five-box architecture showing matchmaking, game server pool, service layer, data layer, and gateway. Stateful vs stateless boundaries highlighted.

The stateful/stateless boundary is where most interview answers fall apart.

TCP Will Get You Rejected

Fast-paced games use UDP, not TCP. Proposing WebSockets for the game server is the most common first mistake, and the reason is physical.

TCP guarantees ordered delivery. If one packet is lost, TCP pauses the stream and retransmits. That pause is called head-of-line blocking. At 128Hz it is catastrophic: a single dropped packet delays every subsequent position update until the lost one is recovered. Your character freezes. You die. You tell people the servers are bad.

UDP sends packets and forgets them. A lost position update from 16ms ago gets skipped. The next update arrives and the game moves on. Nobody cries.

The tradeoff: you have to implement reliability on top of UDP for messages that actually need it, like player death confirmation and round-end events. Libraries like Valve's GameNetworkingSockets and the open-source ENet handle this with two channels: unreliable for high-frequency position updates, reliable for low-frequency critical events. You get the best of both, without building it from scratch at 2am.

Transport layer pitch in an interview: UDP for game traffic, TLS/HTTPS for service layer APIs, WebSocket (or long-polling as fallback) for matchmaking and lobby. See the WebSockets vs long polling vs SSE breakdown for the lobby side.

The Tick Rate Decides Your Bill

The game server's simulation loop runs on a fixed interval called a tick. Each tick: process all client inputs, run physics, check collisions, broadcast updated state to every player.

20Hz: RPGs, slower-paced games
64Hz: most multiplayer shooters
128Hz: competitive FPS (CS2, Valorant)

At 128Hz the server has 7.8ms per tick. Miss that deadline and players see lag.

Doubling tick rate roughly doubles CPU cost per server and bandwidth per player. At 128Hz, each player receives around 120 to 150kbps of game state. With 64 players per server, that's roughly 10Mbps egress per instance. At 10,000 concurrent matches, you're looking at 100Gbps of outbound traffic. Network egress is 20 to 30% of infrastructure spend at scale for major studios. Mention that number in an interview and watch the interviewer nod.

The CPU problem is sneakier. Without spatial partitioning, collision detection is O(N²) across all entities. A quadtree or spatial grid reduces this to roughly O(N) for sparse worlds. Every wall, every tree, every projectile needs physics checks 128 times per second. You do not want O(N²) at that cadence.

Tick rate comparison table (20Hz/64Hz/128Hz vs cost multiplier) and 7.8ms tick budget breakdown showing input processing, physics, hit detection, and state broadcast phases.

Miss the 7.8ms budget once, every 7.8ms thereafter, and you have a lag complaint on your hands.

Three Tricks That Make Lag Invisible

An authoritative server model means the server is the source of truth. Clients cannot dictate outcomes. This prevents cheating but creates a problem: with 80ms of network round-trip, the game feels like you're playing through a wall.

Three techniques work together to hide that latency.

Client-side prediction: when a player presses forward, the client moves the character immediately without waiting for server confirmation. The input is sent to the server in parallel. The server processes it, returns the authoritative result, and the client reconciles. If the server agrees (it usually does), the player never noticed the delay. If it disagrees, the client snaps back. Players mostly just notice the agreement.

Server reconciliation: the client stores a buffer of unacknowledged inputs. When the server sends back its authoritative state, the client replays any inputs that happened after the server's timestamp. This corrects divergence without visible snapping. When it works, it's invisible. When it breaks, you're playing rubber band simulator.

Entity interpolation: other players are shown slightly in the past, typically 50 to 100ms. The client buffers two or three position snapshots and interpolates smoothly between them. You are never guessing where someone is. You are showing where they were, slightly delayed. This is why "I shot him first" is sometimes true from your screen and false from the server's perspective.

Together, these three techniques make 80ms of latency feel like 20ms. Both originate in Valve's production documentation and Gabriel Gambetta's widely cited series on fast-paced game networking. Valorant runs at 128Hz with server-authoritative hit registration: when the server processes a shot, it rewinds its state to what it looked like when the player's input was sent, checks whether the shot connected in that historical state, and confirms or rejects it. A 100ms-ping player sees the same hitboxes as a 20ms-ping player. Fair. Mostly.

Three-track timeline diagram showing client-side prediction (immediate), network round-trip (80ms RTT), server authoritative state, and entity interpolation buffer for other players.

Track 1: you think you moved. Track 3: the server decides if you moved. These two agree about 99% of the time.

Two Databases, Not One

Use Redis for hot state during a match and a durable store for everything persistent. Never the other way around.

During an active match, the game server holds all state in memory. It flushes to Redis periodically as a checkpoint. If a server crashes mid-match, recovery is generally not attempted. The checkpoint exists for session continuity in edge cases and for cross-service reads when other microservices need to know a player's current match status.

For persistent data (player rank, inventory, match history, currency), write to PostgreSQL or DynamoDB first, then cache. Never write to Redis first and flush to the database later. A crash between writes erases player progress. Players will email you. Their subject lines will not be kind.

DynamoDB patterns work well for gaming: player profile on the partition key, match history on a sort key, leaderboard via a global secondary index on score. The caching strategies guide covers write-through vs write-back in depth.

Player write flow:
  1. Write to DynamoDB (authoritative)
  2. Update Redis cache (for fast reads)
  3. Update in-memory game server state

Never reverse steps 1 and 2.

Three-tier data model: game server RAM (hot, volatile, sub-ms), Redis (warm, survives restart, sub-ms), DynamoDB/PostgreSQL (cold, authoritative, 1-10ms), S3 replay archives. Write-order warning highlighted.

Hot is fast and lies to you when the power goes out. Cold is slow and honest forever.

The Stateful Problem Everyone Fumbles

You cannot put a game server behind a round-robin load balancer and call it done. Players are connected. Their state lives in that specific process. All traffic for an active match must route to the exact server instance running that match.

Matchmaking assigns a player to a game server at match start and hands back the server's IP and port. The client connects directly. This is different from how stateless services work, and interviewers test it directly. Say "I'd put a load balancer in front of the game servers" without qualification and watch the follow-up question arrive immediately.

At scale, game server orchestration tools like Agones manage fleet lifecycle, health checking, and autoscaling on Kubernetes. A pool of warm standby instances absorbs demand spikes without cold-start delay. When a match ends, the instance returns to the pool. It's a connection pool, but for entire game simulations.

Geographic distribution follows the players. Competitive games constrain matchmaking to 80ms round-trip or less. A player in Singapore and a player in London do not belong in the same match. Run game servers in every major region and use consistent hashing to assign players within a region to specific server pools.

The Tradeoffs Worth Naming Out Loud

Name these explicitly in the last five minutes. This is where scores separate. Most candidates describe the system. Strong candidates argue for it.

Tick rate vs cost: 128Hz is twice the infrastructure cost of 64Hz. Valorant and CS2 pay it for competitive integrity. Most games do not need to. A casual battle royale at 64Hz saves real money.

Authoritative server vs peer-to-peer: P2P eliminates server cost but makes cheating trivial. Every action game worth playing uses an authoritative server. You are choosing consistency and security over operational simplicity. Say that explicitly.

Client-side prediction vs simplicity: prediction code is genuinely hard to implement correctly. Turn-based games skip it entirely. Real-time games with significant latency cannot. The complexity cost is real and worth acknowledging.

Single-region vs multi-region: single-region cuts operational complexity but gives international players 200ms ping. Multi-region requires player data replication and adds coordination overhead. The CAP theorem tradeoffs apply here directly.

The Recap

Mental model: real-time state machine, not request-response
Protocol: UDP for game traffic, reliable UDP for critical events, WebSocket for lobby
Server model: authoritative server with client-side prediction and entity interpolation
Tick rate: 64Hz baseline, 128Hz for competitive; each doubling roughly doubles CPU and bandwidth cost
State: in-memory during match, Redis for checkpoints, PostgreSQL or DynamoDB for persistence
Scaling: stateless service layer scales freely; stateful game servers require direct client routing and fleet orchestration
Matchmaking: constrain by latency first, skill second; expand skill bands over time to reduce queue wait

Under interview pressure, this design has more moving parts than most. The hard part isn't memorizing the components. It's explaining the reasoning live while an interviewer pushes back on your tick rate choice or asks why you didn't use WebSockets. SpaceComplexity runs voice-based system design mock interviews with real-time feedback on your communication, tradeoff reasoning, and pacing, because reading about it and saying it out loud are very different skills.