Design a Digital Wallet: The System Design Interview Walkthrough

May 27, 202612 min read
interview-prepcareerdsaalgorithms
Design a Digital Wallet: The System Design Interview Walkthrough
TL;DR
  • Double-entry bookkeeping: every transfer writes two ledger entries (debit + credit); the balance is a derived value, never directly updated
  • Money as integers: store amounts in cents (BIGINT), never floats, to avoid floating-point rounding errors across millions of transactions
  • Idempotency keys: a UNIQUE constraint on transactions.idempotency_key makes retried requests safe without executing the transfer twice
  • Deadlock prevention: always acquire SELECT FOR UPDATE locks in sorted account-ID order so concurrent transfers can never deadlock each other
  • Webhook-driven top-up and cash-out: write a PENDING record first, call the external gateway, then credit or debit only after the webhook confirms success
  • Scale in tiers: a single PostgreSQL primary handles ~500 TPS; add a read replica for balance reads; shard by user_id only when you need to go beyond that

Alice sends Bob $50. That one sentence hides a minefield. The money has to leave Alice's account and arrive in Bob's account atomically. No intermediate state where it's in neither account, or worse, in both. At scale, hundreds of those operations happen every second, on the same accounts, at the same time.

In the interview, you draw five neat boxes in the first few minutes. The interviewer nods. Then they ask what happens if the server crashes between the debit and the credit. That's where the fun starts.

The digital wallet system design interview comes up at Google, Stripe, Meta, and every fintech company. It's a focused correctness test dressed up as an architecture question.


What You're Actually Building

Before touching architecture, nail the scope. A digital wallet does four things:

  • Top-up: load money from an external source (bank account, debit card)
  • P2P transfer: move money between two wallet accounts instantly
  • Cash-out: withdraw money to an external bank account
  • Balance inquiry: read a current balance

Everything else (fraud detection, loyalty points, multi-currency) is a follow-on. The instinct to add features before nailing the core is strong. Scope it down unless the interviewer asks.

The non-functional requirements are where the real constraints live. Every transaction must be atomic (money can't vanish mid-transfer), idempotent (retrying a network timeout can't double-charge), and auditable (you need an immutable record for compliance). Availability is high but consistency is non-negotiable. Eventual consistency is not acceptable here.

For scale, a mid-size wallet like Venmo handles roughly 70 million users and perhaps a few million transactions per day. That's around 50 TPS average, maybe 500 TPS peak. Not astronomical. A single relational database can handle the write path without sharding. File that away.


Five Boxes Is Enough

The Wallet Service is the only box that touches the database on the write path. It handles transfer logic, balance checks, idempotency, and ledger writes. The Payment Gateway is an external dependency for top-ups and cash-outs (think Stripe or ACH rails). The Notification Service is fire-and-forget.

Five-box digital wallet architecture: Client, API Gateway, Wallet Service, PostgreSQL primary, Payment Gateway, Read Replica, Notification Service with directional arrows and webhook callback The Wallet Service owns the write path. The webhook loop handles async confirmation from the payment gateway.

Two things to justify immediately when you draw this:

  1. Why PostgreSQL and not Cassandra? Because you need multi-row ACID transactions. When you debit Alice and credit Bob, those two writes must either both commit or both roll back. Cassandra can't give you that.

  2. Why not microservices? A distributed transaction crossing service boundaries requires either 2PC (a distributed locks nightmare) or Saga (eventual consistency that the interviewer will probe you on). For core money movement, a single service owning a single database is simpler and safer. Split out top-up and cash-out into separate services later. Keep the ledger monolithic.

Draw ten boxes and you'll spend the interview explaining service mesh configuration instead of money movement. Don't do it.


The Data Model: Don't Update Balances

You never UPDATE a balance. You only INSERT ledger entries, and the balance is a derived value. This is the most important decision in the design.

This is double-entry bookkeeping, the 500-year-old accounting principle. Venetian merchants used it for silk and spice. You're going to use it for splitting a dinner tab at scale. Every transaction produces exactly two ledger entries: one debit and one credit. The sum of all credits minus all debits for an account is its balance.

Three tables:

CREATE TABLE accounts ( id UUID PRIMARY KEY, user_id UUID NOT NULL UNIQUE, currency CHAR(3) NOT NULL DEFAULT 'USD', balance BIGINT NOT NULL DEFAULT 0, -- cached, in cents version INT NOT NULL DEFAULT 0, created_at TIMESTAMPTZ NOT NULL DEFAULT now() ); CREATE TABLE transactions ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), from_account_id UUID NOT NULL REFERENCES accounts(id), to_account_id UUID NOT NULL REFERENCES accounts(id), amount BIGINT NOT NULL CHECK (amount > 0), -- in cents status TEXT NOT NULL DEFAULT 'PENDING', -- PENDING, COMPLETED, FAILED idempotency_key TEXT NOT NULL UNIQUE, created_at TIMESTAMPTZ NOT NULL DEFAULT now() ); CREATE TABLE ledger_entries ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), transaction_id UUID NOT NULL REFERENCES transactions(id), account_id UUID NOT NULL REFERENCES accounts(id), direction TEXT NOT NULL CHECK (direction IN ('DEBIT', 'CREDIT')), amount BIGINT NOT NULL CHECK (amount > 0), created_at TIMESTAMPTZ NOT NULL DEFAULT now() );

Amounts are integers in cents, never floats. Floating-point math loses pennies. The balance column on accounts is a denormalized cache for fast reads, not the source of truth. The source of truth is ledger_entries. The version column enables optimistic locking. The idempotency_key unique constraint is the entire duplicate-prevention mechanism.

Double-entry bookkeeping diagram showing one TRANSACTIONS row splitting into two LEDGER_ENTRIES rows: a DEBIT on Alice and a CREDIT on Bob, with balance equations below each entry One transfer, two ledger entries. The balance is always derived from the ledger, never stored in isolation.


Walk the Transfer Algorithm Slowly

Don't skip steps here. This is where interviewers probe. "Just wrap it in a transaction" is not an answer.

def transfer(from_id, to_id, amount, idempotency_key): with db.transaction(isolation='REPEATABLE READ'): # Step 1: Idempotency check existing = db.query( "SELECT id, status FROM transactions WHERE idempotency_key = %s", idempotency_key ) if existing: return existing # Return cached result, do nothing # Step 2: Lock both accounts in deterministic order to prevent deadlock lower_id, higher_id = sorted([from_id, to_id]) accounts = db.query( "SELECT id, balance, version FROM accounts " "WHERE id IN (%s, %s) ORDER BY id FOR UPDATE", lower_id, higher_id ) sender = next(a for a in accounts if a.id == from_id) # Step 3: Check balance if sender.balance < amount: raise InsufficientFundsError() # Step 4: Insert transaction record txn_id = db.execute( "INSERT INTO transactions (from_account_id, to_account_id, amount, " "status, idempotency_key) VALUES (%s, %s, %s, 'COMPLETED', %s) " "RETURNING id", from_id, to_id, amount, idempotency_key ) # Step 5: Write two ledger entries db.execute( "INSERT INTO ledger_entries (transaction_id, account_id, direction, amount) " "VALUES (%s, %s, 'DEBIT', %s), (%s, %s, 'CREDIT', %s)", txn_id, from_id, amount, txn_id, to_id, amount ) # Step 6: Update cached balances db.execute( "UPDATE accounts SET balance = balance - %s, version = version + 1 " "WHERE id = %s", amount, from_id ) db.execute( "UPDATE accounts SET balance = balance + %s, version = version + 1 " "WHERE id = %s", amount, to_id )

Three places the interviewer will probe:

Deadlock prevention. Thread A locks Alice then Bob. Thread B simultaneously locks Bob then Alice. They wait for each other forever. Your SLA does not wait. The fix is to always acquire locks in a consistent order, sorted by account ID. This is non-negotiable. Show you know this.

The idempotency check comes first. Before any balance check, look up the idempotency key. If it exists, return the existing result. A client can retry after a network timeout without risk of executing the transfer twice.

SELECT FOR UPDATE vs. optimistic locking. For a wallet, pessimistic locking (SELECT FOR UPDATE) is the right default. Yes, it serializes concurrent writes to the same account. That's the point. The alternative, optimistic locking with a version check, aborts and retries on conflict. Under heavy contention it hammers the database with retries. For financial writes, that serialization is the correct behavior.


Top-Up and Cash-Out: The External Complication

P2P transfer is entirely in-process. Top-up and cash-out cross a network boundary to a payment gateway, which means you can't wrap the whole thing in one database transaction.

For top-up, the flow is:

  1. User initiates top-up. Write a PENDING transaction to your database.
  2. Call the payment gateway. If the call fails (network timeout), do not credit the wallet yet.
  3. The gateway calls your webhook when the charge succeeds.
  4. Webhook handler: credit the wallet, mark the transaction COMPLETED.

The gap between steps 2 and 4 is where things break. This is not theoretical. It's Tuesday. Your webhook must be idempotent. The gateway can call it multiple times. Use the gateway's event ID as an idempotency key.

For cash-out, mirror the flow but invert the order of financial risk: debit the wallet first, then initiate the bank transfer. If the bank transfer fails, issue a compensating credit back to the wallet. This is the simplest form of a Saga pattern: two steps, each with a compensating action. This two-phase structure (write a pending record, then confirm asynchronously) is the same pattern that runs through every payment system, covered in depth in the payment systems design walkthrough.

Top-up sequence diagram with two columns: Wallet Service and Payment Gateway. Shows the async webhook callback pattern with a crash-danger zone annotation between the charge call and the webhook confirmation Money enters the wallet only after the webhook confirms success. For cash-out, flip it: debit first, then transfer.


Idempotency: Say It Out Loud

Every state-mutating endpoint in your API needs an idempotency key. Show the interviewer you know this before they ask.

The client generates a UUID (or uses a request ID). The server inserts it into the transactions.idempotency_key column, which has a UNIQUE constraint. The first insert succeeds. A concurrent or retried request hits the unique constraint, sees the existing transaction, and returns the same response.

Stripe's idempotency implementation works the same way: client-provided key, server caches the response, network retries are safe by default.

The subtle trap tends to surface at 2 AM: if your server crashes after writing the idempotency key but before completing the transfer, the client retries, hits the existing key, and gets back the original (failed) transaction. Your system needs to distinguish between "completed" and "failed" idempotency key entries and only short-circuit on completed ones.


Where the Cracks Appear

At low volume, a single PostgreSQL primary handles everything. At 500 TPS writes, you're still comfortable. Here's where pressure builds at scale:

Hot accounts. A popular merchant might receive thousands of concurrent transfers. Every one locks the same row. They queue up politely behind each other. Your p99 latency does not queue politely. For a single-account bottleneck, consider sharding the account balance across multiple sub-accounts and summing on read (balance partitioning). The hot-row problem shows up in any inventory-like system. Ticketmaster's seat reservation design hits exactly the same bottleneck on popular events.

Read volume. Balance inquiries are read-heavy and don't need the primary. Point them at a read replica. Since the replica has slightly stale data, show the cached balance from the accounts table for speed, not the ledger sum.

Database sharding. If you need to shard, the natural key is user_id. All accounts for a user land on one shard. P2P transfers within the same shard remain single-database transactions. Cross-shard transfers require a Saga.

Scaling tiers comparison table showing Single Primary (0-500 TPS, low complexity), Primary plus Read Replica (500-5K TPS, medium complexity), and Sharded (5K+ TPS, high complexity with Saga required) Scale in tiers. Premature sharding is a distributed-transaction tax you pay for years.


How to Spend 45 Minutes

TimePhase
0-5 minClarify scope: top-up, P2P, cash-out; identify consistency vs availability tradeoff
5-12 minHigh-level architecture: 5 boxes, justify PostgreSQL, explain why not microservices for core ledger
12-22 minData model: three tables, double-entry insight, amounts as integers, idempotency_key constraint
22-33 minTransfer algorithm: step-by-step, SELECT FOR UPDATE, deadlock prevention by ordering locks, idempotency check first
33-40 minTop-up/cash-out: webhook pattern, saga for compensating transactions
40-45 minScaling: hot accounts, read replicas, when to shard

The most common mistake is jumping straight to microservices and Saga coordination for the core transfer. Engineers who've been reading too many distributed systems papers do this. It introduces distributed transaction complexity you don't need when a single service plus a single database handles the correctness requirements cleanly.


The Tradeoffs You Should Name

SELECT FOR UPDATE vs. optimistic locking. Pessimistic is safer under contention, optimistic scales better when conflicts are rare. For financial writes, pessimistic wins. Say why.

Cached balance vs. ledger sum. Cached balance is O(1) but requires careful maintenance. Ledger sum is always correct but O(n) in transaction history. Use the cache for display, use the ledger as the audit source of truth. They should always agree. A nightly reconciliation job catches any divergence.

Monolith vs. microservices. The ledger core stays monolithic. The external integrations (payment gateway adapters, notification delivery) can be separate services. The boundary is clear: anything that touches two wallet accounts in one operation must stay inside one service boundary. That tension is covered in the push vs. pull tradeoff analysis.


How SpaceComplexity Can Help

System design rounds feel different from coding rounds. There's no compiler to check you. The interviewer is watching how you structure an answer under pressure, how you navigate tradeoffs, how you respond when they poke at your architecture. Voice-based mock interviews at SpaceComplexity let you practice this format specifically, with rubric-based feedback on whether you covered the right dimensions and how you handled pushback.


Recap

  • Scope down to top-up, P2P transfer, cash-out, and balance inquiry. Get agreement before architecting.
  • Core principle: never update a balance. Append ledger entries. The balance is a derived view.
  • Use double-entry bookkeeping: every transaction produces exactly two ledger rows.
  • Store money as integers (cents). Never use floats.
  • Idempotency key with a UNIQUE constraint is how you make network retries safe.
  • SELECT FOR UPDATE for pessimistic locking. Always acquire locks in a consistent order (sorted by ID) to prevent deadlocks.
  • External top-up and cash-out use a webhook callback and Saga pattern for compensation.
  • Scale in tiers: single primary, then read replicas, then shard by user_id only when you need to.

Further Reading