REST vs gRPC: The System Design Interview Decision Guide

- Protocol Buffers over HTTP/2 give gRPC 3 to 10x smaller payloads and 5 to 10x higher throughput than JSON over HTTP/1.1 at scale
- gRPC streaming defines four first-class types (unary, server, client, bidirectional); REST streaming is always a workaround
- Browsers cannot use gRPC natively and gRPC-Web drops client and bidirectional streaming entirely
- REST GET responses cache at every CDN layer; gRPC uses POST for all calls, which is never cacheable by design
- gRPC requires L7 load balancing in Kubernetes; REST distributes naturally with L4
- The production pattern: REST externally for public APIs, gRPC internally for service-to-service calls
- A strong interview answer names the REST/gRPC architectural boundary explicitly and explains both sides
Most system design interviews ask you to pick one. REST or gRPC. Pick the wrong one and you look like you're cargo-culting. Pick the right one and hand-wave the tradeoffs and you look the same. The difference is whether you can explain why each exists and where the other one breaks.
The actual differences, the concrete numbers, and the reasoning pattern that scores points.
The Wire Format Is the Whole Story
REST ships text. gRPC ships binary. That one sentence explains most of the performance gap.
A REST API serializes your data as JSON and sends it over HTTP/1.1. One request, one response, one TCP connection. Human-readable, universally supported, and easy to debug with curl. Also: your field names are just... there on the wire. Every single request. The string "timestamp" is 9 bytes that will travel from your server to someone's phone until the heat death of the universe.
gRPC uses Protocol Buffers over HTTP/2. Protocol Buffers encode each field as a tag-value pair in binary, which is typically 3 to 4 times smaller than the equivalent JSON. A stock ticker record that takes 102 bytes in JSON takes 25 bytes in protobuf. Your payload no longer has a payload. At a million requests per minute that difference compounds fast.
HTTP/2 matters just as much as the serialization format. gRPC multiplexes all RPC calls over a single persistent TCP connection, compresses headers with HPACK (85 to 95% header size reduction on repeat traffic), and uses binary framing throughout. HTTP/1.1 opens a new connection per request or holds a small pool, paying the TCP and TLS handshake overhead every time.
The upshot in production:
| Metric | REST (JSON, HTTP/1.1) | gRPC (Protobuf, HTTP/2) |
|---|---|---|
| Payload size | Baseline | 3 to 4x smaller (uncompressed) |
| Serialization speed (Java) | Baseline | 5 to 6x faster |
| Throughput (large payloads) | Baseline | ~10x higher |
| p99 latency (high concurrency) | Baseline | 5 to 11x lower |
| CPU usage | Baseline | ~19% lower |
| Network bandwidth | Baseline | ~41% lower |
These numbers come from production benchmarks under high concurrency and large payloads. At low concurrency with small payloads, the gap is marginal. If your system tops out at fifty users, neither choice matters. But no one asks you to design a system for fifty users.
You Define the Contract in One Place
gRPC is schema-first. You write a .proto file, run the protoc compiler, and get generated client and server stubs in whatever language you need.
service OrderService { rpc GetOrder (OrderRequest) returns (OrderResponse); rpc StreamUpdates (OrderRequest) returns (stream OrderEvent); }
That file is the contract. Both sides compile against it. Change a field number and you break the wire format. Add a new field and old clients skip it safely. This tight coupling is intentional: in a polyglot microservices environment, you want the compiler to catch API drift, not your on-call engineer at 2am discovering the payments service is now returning amount_cents where it used to return amount.
REST has no enforced contract by default. OpenAPI/Swagger adds schema documentation but no compile-time enforcement. REST is the honor system of API design. Any client that speaks HTTP can call any REST API, which is the feature when you want broad compatibility and the problem when you want strict guarantees.
Streaming Is Where gRPC Wins Decisively
REST has no native streaming primitive. If you need real-time data from a server to a client, your options are Server-Sent Events (one direction only), WebSockets (bidirectional, but now you're building a protocol on top of a raw socket, have fun), or long-polling (which is exactly what it sounds like and exactly as fun as it sounds). All of these are workarounds layered on top of HTTP. See WebSockets vs Long Polling vs SSE for the full comparison of which workaround to reach for.
gRPC defines four streaming patterns as first-class types in the schema, with full type safety and generated stubs for each.
| Pattern | Use case |
|---|---|
| Unary | Standard request-response (same as REST) |
| Server streaming | Live feeds, log tailing, paginated results streamed as they arrive |
| Client streaming | Batch upload, IoT sensor ingestion, ML training data pipelines |
| Bidirectional streaming | Chat, collaborative editing, game state sync, real-time control loops |
Uber uses server streaming for live location tracking. Netflix uses client streaming for telemetry ingestion. Square migrated their payments-fraud platform from REST and WebSockets to bidirectional gRPC streaming and measured a 35% drop in p99 latency along with a 60% reduction in connection count per node.
If your system design involves any real-time data flow, gRPC streaming is the right tool.
Where REST Still Wins
The performance numbers favor gRPC. The use cases don't always.
Browser clients can't use gRPC natively. Browsers expose no API for raw HTTP/2 frame control. gRPC-Web exists as a workaround, but it requires an Envoy or similar proxy to translate between gRPC-Web and gRPC, and it drops client streaming and bidirectional streaming entirely. Browsers are stubborn that way. They've been serving JSON since before most of your coworkers could drive, and they're not changing for you. For a public-facing web API, REST is still the right choice.
REST responses are CDN-cacheable. gRPC responses are not. REST GET requests with Cache-Control headers cache at every layer: browser, CDN, reverse proxy. Cloudflare, Fastly, and Akamai cache them natively. gRPC uses HTTP POST for every call, including reads. POST is not cacheable by design. This is a hard constraint, not a configuration issue. You cannot negotiate with HTTP semantics.
REST is debuggable with tools you already have. A REST call gone wrong shows up as readable JSON in your nginx logs. A gRPC call gone wrong is binary. grpcurl solves this for ad-hoc testing, but it requires server reflection or the .proto file available, and you'd better hope someone put that .proto in the oncall runbook. The tooling gap has narrowed in recent years (Postman added full gRPC support in 2023), but curl plus browser DevTools still wins when something is on fire at 3am.
REST also wins for third-party developer ecosystems. If you're building a public API, most developers expect REST. Stripe, GitHub, AWS, and Twilio all use REST for their public APIs. Not because they don't know about gRPC.
The Load Balancing Trap
This is the gotcha that trips up most system design answers that choose gRPC. You describe the performance benefits, sketch a nice architecture, feel good about yourself, and then walk straight into this.
gRPC multiplexes all calls over a single persistent HTTP/2 connection, which means a Layer 4 load balancer will send all of a client's traffic to the same backend pod. L4 load balancers distribute TCP connections, not requests. If a client opens one connection, every RPC goes to one server. You auto-scale to five pods. Traffic analysis shows one very tired pod and four tourists. Congratulations, you've built the world's most expensive single-threaded server.
REST over HTTP/1.1 opens a new connection per request (or a small pool), so L4 load balancers distribute load naturally. No special configuration. It just works.
For gRPC you need Layer 7 load balancing, which understands HTTP/2 and routes each individual RPC to a backend. In Kubernetes, that means one of:
- Nginx, Envoy, or HAProxy 2.0+ in front of the service
- A service mesh (Istio with Envoy sidecars, Linkerd) that handles per-RPC routing transparently
- Client-side load balancing where the gRPC client queries a service registry directly
See Load Balancing Algorithms for how the algorithms differ. The point for your interview answer: if you choose gRPC, say explicitly that you'd put an L7 proxy or service mesh in front of it. Not mentioning this is a red flag. It signals you picked gRPC for the vibes, not because you understand what it actually does.
The Architecture That Actually Works at Scale
Google, Netflix, and most large distributed systems use the same pattern. Not because they're trend-following. Because they tried everything else first.
REST externally, gRPC internally.

Public-facing APIs use REST with JSON over HTTPS. Third-party developers get familiar semantics and broad tooling compatibility. CDNs cache the read-heavy endpoints. The API surface is stable and versioned.
Internal service-to-service communication uses gRPC. Services compiled against shared .proto schemas get generated clients in whatever language each team writes. The performance advantage compounds across hundreds of internal calls per user request. Streaming handles the real-time data flows that REST can't.
A strong system design answer makes this boundary explicit and explains why. For a ride-sharing system: the mobile client calls a REST API gateway (browser-compatible, cacheable), which fans out to internal gRPC services for location tracking (server streaming), dispatch (unary), and pricing (unary). The streaming telemetry from drivers flows over a client-streaming gRPC endpoint directly from the mobile SDK.
If the interviewer asks where GraphQL fits: it belongs at the API gateway layer when the client needs to aggregate data from multiple internal services in a single request, avoiding the N+1 REST call problem. It does not replace gRPC internally, and it adds resolver overhead that makes it the wrong choice for raw service-to-service calls.
How to Choose REST vs gRPC in a System Design Interview
You will rarely be asked "REST or gRPC?" directly. You'll be asked to design a system and expected to choose. The signals that should push you toward gRPC:
- Internal service-to-service calls in a controlled environment
- Streaming requirements of any kind
- High throughput or hard latency requirements (ML inference pipelines, financial transaction processing)
- Mobile clients with bandwidth constraints
- Polyglot teams that need schema contracts enforced by the compiler
The signals that should keep you on REST:
- Public-facing API consumed by third-party developers or browsers
- CDN caching is a meaningful part of your scalability answer
- Simple CRUD with no streaming needs
- Debuggability and observability matter more than raw performance
The answer the interviewer is actually scoring: you named the tradeoffs, you picked the right tool for each boundary in your architecture, and you mentioned the L7 load balancing requirement if you chose gRPC.
If you want to practice saying these tradeoffs out loud under pressure, SpaceComplexity runs voice-based system design mock interviews with rubric feedback on exactly this kind of reasoning. Reading tradeoffs is one thing. Articulating them while drawing a diagram under time pressure is a different muscle entirely.
The Short Version
- Protocol Buffers over HTTP/2 give gRPC 3 to 10x smaller payloads and 5 to 10x higher throughput than JSON over HTTP/1.1 at scale
- gRPC's four streaming types are first-class and type-safe; REST streaming is always a workaround
- gRPC does not work natively in browsers (gRPC-Web loses client and bidirectional streaming)
- gRPC POST semantics mean no CDN caching; REST GET responses cache natively
- gRPC requires L7 load balancing in Kubernetes; L4 load balancing sends all traffic to one pod
- The production pattern is REST externally, gRPC internally
- A strong interview answer names the boundary and explains both sides
Further Reading
- gRPC Core Concepts: official documentation covering streaming types and the HTTP/2 protocol contract
- Protocol Buffers Encoding Guide: how tag-value pairs and varint encoding produce the size advantage
- State of gRPC in the Browser: the official explanation of why gRPC-Web drops client and bidirectional streaming
- gRPC Load Balancing: the official architecture guide covering L7 requirements and client-side options
- Understanding gRPC, OpenAPI, and REST: Google Cloud's framework for choosing between the three
- Wikipedia: gRPC: history, adoption list, and protocol overview