Design a Distributed Counter: Simple Premise, Brutal Tradeoffs

- Write/read path separation is the core insight: fan writes to shards, aggregate on read, cache the result with a short TTL.
- Application-level sharding with random shard selection turns 80K writes per second into 800 per shard, eliminating hot partitions.
- Idempotent increments via a
request_idfield prevent duplicate counts during client retries. - Three consistency models: buffered writes for throughput, G-Counter CRDT for partition tolerance, serialized leader for exact counts like inventory.
- Kafka vs Redis: Kafka adds durability and replay at the cost of latency; Redis is faster but loses in-flight increments on crash.
- Hit bottleneck discussion by minute 25 in the interview. Tradeoffs are what separates hire from strong hire.
A counter is the simplest data structure imaginable. An integer. You increment it. You read it. Done. Ship it.
Distribute that integer across ten machines handling ten million requests per second, and suddenly you're navigating CAP theorem, CRDT design, and hot-key mitigation all at once. The distributed counter is a favorite system design interview problem because it hides three hard tradeoffs inside a three-word spec. The interviewer already knows the answer isn't "just use Redis." They're waiting to see if you do.
Full walkthrough: requirements, architecture, data model, API design, scaling, and how to narrate it in 45 minutes.
The First Five Minutes Set the Architecture
Don't touch the whiteboard yet. (You'll be erasing it in three minutes anyway.) A counter for page views on a news site has completely different requirements than a counter for inventory in a flash sale.
Ask these before you draw a single box:
- What's being counted? Page views, likes, inventory, API rate limits?
- How many write operations per second? 10K? 1M? Higher?
- Does the read need to be exact? A like count showing 99,847 instead of 99,850 is fine. Inventory showing 1 unit available when it's actually 0 is not. (The customer already clicked Buy. Twice.)
- What's the failure tolerance? Can we lose a handful of increments during a crash?
- How many distinct counters? One global counter versus millions of per-user counters changes your data model entirely.
For this walkthrough: page views on a social platform, 100K writes per second at peak, millions of distinct counters, reads that tolerate a few seconds of staleness, and a few lost increments during crashes is acceptable.
Those last two answers just unlocked your entire scaling strategy.
Distributed Counter Architecture: Decouple Writes from Reads
The naive design is one node:
Client → Counter Service → Redis (single node)
Redis INCR is atomic and O(1). A single Redis node handles roughly 100K to 180K operations per second on standard hardware. At first glance, that's enough. Add replication lag, failover time, memory pressure from millions of keys, and one viral post that gets 80K of those writes, and you're already in trouble.
The correct design decouples increments from reads.

Write path takes raw increments. Read path aggregates and caches. Same API tier, different storage shapes.
Writes are fast and cheap. Reads aggregate shards or a replica. This separation is the core design insight the interviewer wants to see.
The Schema That Makes Sharding Work
For a sharded counter, the data model looks like this:
counter_shards: counter_id VARCHAR -- which counter (e.g., "post:12345") shard_id SMALLINT -- which shard bucket (0 to N-1) value BIGINT -- partial count for this shard PRIMARY KEY (counter_id, shard_id)
At write time, pick a shard by hashing the request:
shard_id = hash(request_id) % NUM_SHARDS # or random.randint(0, NUM_SHARDS - 1) redis.incr(f"{counter_id}:{shard_id}")
At read time, aggregate:
total = sum( redis.get(f"{counter_id}:{i}") or 0 for i in range(NUM_SHARDS) )

One logical counter. N physical keys. The read aggregates them all and caches the result.
With 16 shards, each shard handles 1/16 of the writes for any given counter. This is application-level sharding: spread writes across N buckets, aggregate on read.
The tradeoff is read cost. A GET on a single key becomes 16 GETs. Cache the aggregated value with a short TTL (5 to 10 seconds) to make reads cheap again.
Keep the API Small and Make Increments Idempotent
POST /counters/{counter_id}/increment
Body: { "delta": 1, "request_id": "uuid-abc-123" }
Returns: { "ok": true }
GET /counters/{counter_id}
Returns: { "counter_id": "post_12345", "value": 99847 }
The request_id field is your only defense against duplicate increments. During a network partition, the client may retry an increment that already succeeded. Without idempotency, a single user click becomes five increments. Store processed request IDs with a short TTL in Redis (a SET with EXPIRE) and reject duplicates.
For rate-limiting use cases, add a TTL to the counter itself so it automatically expires:
POST /counters/{counter_id}/increment
Body: { "delta": 1, "ttl_seconds": 3600 }
Hot Keys Will Kill You
Even with 16 shards, a viral post receiving 80K writes per second fans out to 5K writes per second per shard. On a 3-node Redis cluster where each node owns about 5 shards, one node still handles 25K writes per second for a single counter. That's a hot partition. One celebrity tweet just turned your distributed system back into a single-node problem.
The fix is to increase the shard count only for hot counters, picking the shard at random rather than by hash:
# Write: random shard selection distributes traffic evenly shard_key = f"{counter_id}:{random.randint(0, 99)}" redis.incr(shard_key)
With 100 shards and random selection, 80K writes per second fan out to 800 writes per second per shard. That's well inside any reasonable limit. The read aggregates 100 GET operations instead of 1, which costs about 5 milliseconds total, and you cache the result anyway.
See hot partition patterns for how this problem appears in other system design contexts.
The Three Consistency Models, and When to Pick Each
This is where the interview separates "hire" from "strong hire." You need a clear position on consistency, not a vague wave at CAP theorem.
1. Buffered writes (eventual, high throughput)
Batch increments client-side or in a service layer and flush every N seconds. Accept a 2 to 5 second accuracy window. This is what most social platforms use for view counts and likes. You lose a few increments if the buffer crashes before flushing, but the user never notices. Nobody has ever filed a support ticket because their like count showed 9,999 instead of 10,001.
Use this for: page views, like counts, engagement metrics.
2. G-Counter CRDT (eventual, no coordination)
Each node maintains its own slice of the counter. The G-Counter is a conflict-free replicated data type where the value is the sum of all node slices, and the merge rule is element-wise maximum:
Node A sees: [5, 0, 0]
Node B sees: [0, 3, 0]
Node C sees: [0, 0, 7]
Merge: [5, 3, 7] (max per slot)
Total: 5 + 3 + 7 = 15
No locks, no coordination. Nodes gossip and converge. The math forces correctness rather than hoping every node does the right thing. That is rarer than it sounds.

Each node owns its own slot. The merge is element-wise max. Agreement is guaranteed, not negotiated.
A PN-Counter extends this with a second G-Counter for decrements: value = sum(P) - sum(N). Both are provably correct under any network partition.
Use this for: distributed systems where nodes can't always talk to each other.
3. Strong consistency (serialized writes)
Serialize all writes through a single leader per counter partition, using Raft or a distributed lock. Every increment is acknowledged only after it's committed. Exact counts, low write throughput.
Use this for: inventory management, financial tallies, anything where "one more than available" causes real harm.
For the interview scenario (page views, staleness acceptable), buffered writes is the right answer. State it confidently and explain why.
See CAP theorem for the formal consistency tradeoffs behind this choice.
The Bottleneck Checklist
Before you defend your design to the interviewer, attack it yourself. Walk through each layer and name what breaks first.
| Layer | Bottleneck | Fix |
|---|---|---|
| Write path | Single hot key | Application-level sharding, random shard selection |
| Write path | Redis memory | Periodic flush to durable storage, short TTLs |
| Read path | Aggregating N shards | Cache the aggregated value with a 5s TTL |
| Durability | Lost increments on crash | Redis AOF persistence or Kafka as event log |
| Network | Write storms on publish | Client-side batching, async flush queue |
| Consistency | Stale reads | Explicit freshness contract in the API response |
The most common architecture mistake is treating the counter like a regular key-value entry. The write pattern is append-only and write-heavy. The read pattern is aggregate-and-cache. Those two shapes want different infrastructure, and the separation between them is what you're designing.
A Kafka-based design handles very high throughput with built-in durability. Clients publish increment events to a topic. Aggregator workers consume, sum, and flush to Redis. The log gives you replay and exactly-once semantics via Kafka Streams. The tradeoff is latency: an increment doesn't appear in the read path until a consumer processes it, which could be a few seconds.
Clients → Kafka (increment events) → Aggregator (Kafka Streams) → Redis (current count)
↘
Cassandra / PostgreSQL (archive)
Use Kafka when durability matters more than sub-second freshness. Use direct Redis sharding when you need low latency and can tolerate a small loss window.
Talk Through Bottlenecks by Minute 25
The clock is a constraint you have to design around, not just the system.
- Minutes 0 to 5: Clarify requirements. Ask the four questions above. Don't start drawing.
- Minutes 5 to 15: Sketch the high-level design. Name the write/read path split. Introduce sharding.
- Minutes 15 to 25: Go deep on data model and API. Show the schema. Derive the shard key math. Mention idempotency.
- Minutes 25 to 35: Bottlenecks. Hot keys. Consistency model and why you picked it.
- Minutes 35 to 42: Tradeoffs. Walk the table. Push on what happens if the requirement changes.
- Minutes 42 to 45: Open for questions. Ask the interviewer what they want to explore further.
The mistake most candidates make is spending 30 minutes on architecture and running out of time before bottlenecks. Interviewers weight tradeoff discussion heavily. A simple design that gets to tradeoffs beats a complete design that doesn't.
Narrate your decisions as you make them. "I'm using random shard selection here instead of consistent hashing because I want writes to spread evenly even on hot counters" is the kind of sentence that shows up verbatim in the write-up. Silence under pressure looks like confusion.
For the consistency question specifically, don't hedge. Name your model, state the tradeoff, and explain why the requirements justified it. Interviewers probe consistency because it forces a real opinion.
If you want to practice the 45-minute version of this out loud, under time pressure, with real follow-up questions, SpaceComplexity runs voice-based system design mocks with rubric-based feedback. Distributed counters come up often.
Recap
- Decouple writes from reads. Increment to shards, aggregate on read, cache the aggregated value.
- Application-level sharding with random shard selection solves hot keys. 100 shards turns 80K writes per second into 800 per shard.
- Choose your consistency model deliberately: buffered for high throughput, G-Counter CRDT for partition tolerance, serialized leader for exactness.
- Kafka adds durability and replay. Redis adds speed. Use both for counters where losing increments is costly.
- In the interview: hit bottlenecks by minute 25. Tradeoffs are how you move from hire to strong hire.