Design a Live Sports Score System: The 45-Minute Walkthrough

A World Cup final. One goal scored. Within five seconds, 50 million people need to see the scoreline change.

That's the problem. A soccer match produces maybe three goal events across 90 minutes. The writes are trivial. The reads are the challenge. A single write fans out to tens of millions of open connections, all expecting the update in near real-time.

Most system design answers handle the first part fine. The second part is where candidates go quiet, staring into the middle distance like they just remembered they left the oven on.

Here's how to walk through this live score system design in 45 minutes.

Five Minutes on Requirements

Start here. The scope shapes every decision downstream, and interviewers can tell when you skipped it.

Functional requirements:

Users view live scores for ongoing matches
Scores update within 10 seconds of the real-world event
Users view completed match results and historical stats
The system covers multiple sports and hundreds of concurrent matches

Non-functional requirements:

50 million concurrent users at peak (World Cup final)
99.99% availability during live events
Eventual consistency is acceptable for score display (a 5-second lag between two users is fine; this is not a betting platform)

Out of scope for this interview (say this out loud): video streaming, authentication, in-app betting, fantasy sports, live chat. Mention them as extension points, then move on.

The one clarification you must ask: do users send any data back to the server during the match, like reactions, polls, or in-app bets? The answer changes your delivery mechanism. Display-only scores can use SSE. Bidirectional interaction needs WebSockets. Get this answered before you draw anything.

Two Numbers That Shape the Whole System

Before architecture, run the quick estimation. Interviewers want to see you think about scale before you commit to a design.

50 million concurrent connections at peak. JioHotstar documented 65 million simultaneous viewers for an India vs England T20 semifinal in 2026. That's your real-world ceiling.
1 to 3 meaningful score changes per match per 90 minutes for soccer. Across 500 concurrent matches globally, that's 30 to 60 significant score writes per second. The write rate is almost nothing.

The fundamental asymmetry: trivial write rate, enormous read rate. Your entire architecture is about fanning out a handful of writes to tens of millions of waiting connections. That's the problem to solve. Everything else is plumbing.

High-Level Architecture: Four Layers

Draw this before going deep on any component. Establish the big picture, then zoom in.

Four-layer pipeline from stadium to 50M clients, showing push path via Kafka and Redis pub/sub on the left, pull path via CDN edge and REST API on the right Two distinct traffic patterns. Push delivery for connected clients watching live. Pull delivery for clients requesting scores on demand.

The architecture falls into two paths:

Push path (live watching): Stadium data provider feeds into an Ingestion Service, into Kafka, into a Score Processor, and then Redis pub/sub fans it out to a fleet of WebSocket/SSE servers connected to your 50 million clients.

Pull path (on-demand): Client hits CDN edge, which absorbs 90 percent of requests before they reach the origin Score REST API backed by Redis.

Most of your traffic can be absorbed at the CDN edge with a short TTL, and that's not a compromise. It's the right design.

Data Ingestion: You Don't Own the Stadium

Most candidates design the ingest layer as if they have a sensor bolted to the goal post. You don't.

The data comes from a provider like Sportradar or Stats Perform. Sportradar has 12,600 scouts covering 900,000-plus events a year. A scout at the stadium enters a goal on a tablet app. That event reaches Sportradar's systems in under a second. Then it gets pushed to your backend via a long-lived HTTP chunked streaming connection or a WebSocket feed. From the real-world event to your ingestion service, the latency is 3 to 5 seconds for consumer apps. That 5-second SLA you promised? Most of it is already spent before you touch a line of code.

Scout at venue entering event on tablet, flowing through Sportradar to your Ingestion Service, then writing to Kafka and Redis pub/sub in parallel The pipeline your candidates don't draw. Total end-to-end latency: 3-5 seconds from boot to whistle.

Your ingestion service receives the event, validates it, writes the new score to Redis, appends the event to Kafka (the durable record), and publishes a lightweight delta to the appropriate Redis pub/sub channel. That's the entire write path.

One nuance worth stating: Sportradar's REST endpoints cache responses for a minimum of 3 seconds on their end. Polling every second does nothing. If you need sub-5-second freshness, you need their push feed, not polling.

The Fan-Out Problem: Where the Interview Actually Lives

One score update. Fifty million clients. This is the part that separates a passing answer from a strong hire signal.

The naive approach: one server handles all connections. One server cannot hold 50 million connections. Dead end.

A better approach: distribute clients across many WebSocket servers behind a load balancer. Problem: WebSocket connections are stateful. A connection lives on one specific server. If you load balance without sticky sessions, a client's requests get routed to a server that has no idea that client is connected.

You need either sticky sessions (load balancer routes a client to the same server based on IP hash or a cookie) or distributed state (session data stored externally so any server can resume any client's connection).

The standard production solution is the Redis pub/sub backplane. Each WebSocket server subscribes to the Redis channels for the matches its clients are watching. When the score processor receives an update, it publishes one message to match:{id}:events in Redis. Every WebSocket server subscribed to that channel receives the message and fans out locally to its connected clients. One publish, millions of writes. CricInfo documented delivering 1 million WebSocket updates in under 100ms using exactly this pattern.

Score processor publishes one message to Redis pub/sub channel, which fans out to N WebSocket servers each holding 500K connections, totaling 50 million deliveries in under 100ms One Redis PUBLISH. Fifty million deliveries. The arrow widths are not to scale because nothing would be visible on the left side.

Two rules that keep this efficient:

First, use per-match channels. A client watching a Premier League match has no interest in a cricket score update. If you use one global score-updates channel, every WebSocket server processes every score update, even for matches none of its clients are watching. Use match:{id}:events so each server's subscription scope matches its clients' interests.

Second, coalesce updates for slow clients. If a client's socket buffer fills up (slow network, mobile in a tunnel), don't queue every intermediate delta. Keep a pointer to the latest state and send that when the connection drains. A slow consumer should never block the pipeline.

A single well-tuned Linux server can hold 500,000 idle WebSocket connections with 16 GB RAM. At 1 million connections you need 2 servers. At 50 million connections across many servers, the Redis pub/sub bridge keeps them synchronized.

SSE or WebSocket: Pick One and Defend It

For score display only: SSE is the right choice. Server-Sent Events are standard HTTP. They work through normal load balancers without sticky session configuration. They reconnect automatically on disconnect. They're half-duplex by design, which is exactly what a score feed requires.

Most candidates choose WebSockets by default because it sounds more capable. It's the system design equivalent of using a table saw to slice bread. SSE is equally real-time for server-push workloads and significantly simpler to operate at scale.

For apps with bidirectional features (reactions, polls, in-app bets): use WebSockets. The connection is full-duplex and both sides can initiate messages.

State the tradeoff, then pick based on whether the client sends data. See WebSockets vs Long Polling vs SSE for the full comparison.

Caching: The Part Everyone Underestimates

CDN edge caching with a 10-to-30-second TTL is highly effective for live scores, even though scores change. That's the counterintuitive part. Most candidates hear "live" and assume CDN caching is off the table. It isn't.

Most users don't need sub-second updates. They need sub-30-second updates. A CDN edge node serving cached score payloads absorbs 90 percent of REST requests before they reach your origin. ESPN serves 100,000 requests per second on a few hundred servers using exactly this strategy.

Your caching layers, fastest to slowest:

Layer 1: In-process cache on each score API server. Eliminates Redis round trips for the handful of matches that millions of clients are watching right now. A local LRU cache with a 5-second TTL handles this.

Layer 2: Redis as the source of truth for live match state. Hash per match: HSET match:{id} home_score 2 away_score 1 status live minute 72. Sorted sets for live standings: ZADD standings:{league_id} {points} {team_id}. Sub-millisecond reads from anywhere in your fleet.

Layer 3: CDN edge for REST score endpoints. 10-to-30-second TTL. When a score changes, send a surrogate key purge to the CDN to invalidate the affected entry immediately rather than waiting for TTL to expire.

The thundering herd on cache miss: when a Redis key expires, every app server simultaneously tries to recompute the new value. Use a distributed lock (SET lock:match:{id} NX EX 1) so only one server recomputes while others wait and then read the refreshed value.

For deeper caching patterns, caching strategies for system design covers cache-aside, write-through, and read-through in detail.

Data Model: Score Corrections Are Inevitable

Two stores with different roles.

PostgreSQL for structured metadata: teams, leagues, seasons, schedules, player rosters. Infrequently updated, relational, needs integrity constraints. Boring. Perfect.

An append-only event log for the match itself. Each event is a record: (match_id, sequence_number, event_type, timestamp, player_id, metadata_json). Goals, yellow cards, substitutions, VAR decisions. The current score is derived from this log and cached in Redis.

You need event sourcing here because VAR exists. When a goal is overturned, you don't update the original event row. You append a new GOAL_CANCELLED event with a higher sequence number and recompute the score from the log. This preserves the full audit trail and handles corrections without data loss.

Append-only event log showing GOAL at sequence 31 and GOAL_CANCELLED compensating event at sequence 47, with arrow showing materialized score being recomputed in Redis as 0-0 VAR overturns the goal. The log gains one entry. The original goal row is untouched. Every client gets a full state push.

When a correction fires, push a full state snapshot to all connected clients rather than a delta. Delta-based updates are fine for additive events (goal scored). Corrections need a full push so every client converges to the correct score simultaneously, regardless of what partial state they held before. The correction event goes through the same Redis pub/sub backplane as normal updates but carries a type: full_state flag that tells clients to replace rather than patch.

The 45-Minute Clock

Minutes	Focus
0-5	Requirements, clarifications, scale numbers
5-15	High-level architecture, four layers, two traffic patterns
15-25	Ingest pipeline, Kafka, Redis write path
25-35	Fan-out problem, Redis pub/sub backplane, WebSocket vs SSE
35-42	Caching layers, data model, event sourcing for corrections
42-45	Tradeoffs, extension points, questions from interviewer

If you only have time to go deep on one thing, make it the fan-out problem. It's the design challenge that distinguishes live score systems from ordinary CRUD APIs. The interviewer can look up how to design a REST endpoint. They want to see how you think about one event reaching 50 million open connections.

Key Tradeoffs to Mention

SSE vs WebSocket: SSE is simpler and sufficient for display-only. WebSocket is required for bidirectional interaction.

Redis pub/sub vs Kafka: Kafka is durable and replayable; multiple independent consumers (stats pipeline, notification service, odds engine) can independently read from it. Redis pub/sub is lower latency and fire-and-forget; messages are lost if a server temporarily disconnects, which is acceptable because the client re-requests current state on reconnect from the Redis hash.

Sticky sessions vs distributed state: Sticky sessions are simple but fragile. A server crash reconnects all its clients at once (thundering herd). Storing session context in Redis makes failover invisible to clients. The hybrid is common: sticky sessions for normal operation, Redis-backed reconnect on failure.

Per-match channels vs global channel: Global is simpler to implement. Per-match is strictly better at scale because fan-out scope is limited to matches clients are actually watching.

Display users vs betting users: A 10-second stale CDN cache is fine for someone checking the scoreline on their phone. It's completely wrong for someone placing an in-play bet. Partition these two user classes and bypass the cache entirely for betting clients, routing them directly to Redis reads. This is an extension point, not part of the base design.

Recap

The write rate is trivial. The read rate is the problem.
Data arrives from a third-party push feed (Sportradar), not a sensor you own. Upstream latency is 3 to 5 seconds.
Kafka at the ingest layer for durability and multiple consumers. Redis pub/sub at the delivery layer for speed.
Per-match Redis pub/sub channels fan out one published message to all WebSocket servers and their connected clients in under 100ms.
SSE for display-only. WebSockets when the client also sends data.
CDN edge with a 10-to-30-second TTL absorbs 90 percent of REST traffic. It's not a compromise.
Event sourcing in the data model handles VAR corrections without corrupting history.
When a correction fires, push a full state snapshot, not a delta.
Partition display users (eventual consistency fine) from betting users (strong consistency required, bypass cache).

If you want to practice walking through this out loud, under time pressure, with live follow-up questions like "now assume 100 million users connect in the first minute of the match," try SpaceComplexity. It runs voice-based system design mock interviews with rubric-based feedback on your architecture reasoning, not just whether you named the right components.