Design an Online Code Editor: The System Design Interview Walkthrough

You open a browser tab, type some Python, hit Run, and get output back in two seconds. Simple. Except the system behind that is one of the harder design problems in the interview circuit, because it involves four separate engineering domains at once: real-time collaboration, secure sandboxing, file persistence, and language intelligence. Most candidates pick one and quietly hope the interviewer doesn't notice the other three.

This walkthrough covers the full 45-minute arc: requirements, the five-box architecture, the key decisions, and the bottlenecks your interviewer already knows about and is waiting for you to name.

Clarify Scope Before You Draw Anything

The phrase "online code editor" spans a huge range. A LeetCode-style judge is nothing like Replit, which is nothing like VS Code in the browser. These are completely different products. Spend the first five minutes pinning down exactly which one you're building, because the architecture changes dramatically depending on the answer.

Questions that change the whole architecture:

Single user or real-time collaborative? Collaboration adds a sync layer (OT or CRDT) and a completely different message path.
Execution model: run-and-return (like a coding judge) or persistent terminal session (like a full IDE)?
Which languages? Interpreted languages like Python are easy. Compiled languages like C++ need a build step. Arbitrary language support means containerizing dozens of runtimes.
Is workspace state ephemeral or persistent? Do projects survive browser close?
Scale target: 10K concurrent active sessions, 100K, 1M?

For this walkthrough, assume a medium-complexity product: multi-language support, persistent projects, terminal access, no real-time collaboration. For the collaborative sync layer, see the collaborative editor design separately. Target: 100K concurrent sessions.

Draw Five Boxes, Then Connect Them

Here's the full architecture. Study this before reading the rest.

Five-box architecture for an online code editor: browser with Monaco and xterm.js connects down to API gateway and auth, which fans out to workspace orchestrator, file service, and LSP service, with workspace orchestrator provisioning a Firecracker microVM below

The browser talks to the API gateway. The gateway fans out to three backend services. Only the workspace orchestrator reaches down into the execution environment.

The execution environment and the file storage are two separate concerns that meet inside the workspace. Files live in persistent object storage. Execution happens in ephemeral sandboxes. The workspace orchestrator stitches them together at session start.

The Sandbox Is the Hard Problem

This is what most candidates rush past, and it's also the decision that matters most.

If you let user code run in a plain Docker container on a shared host, one malicious fork() bomb or container-escape CVE exposes every other user on that machine. You are literally running stranger's code next to other strangers' code on the same kernel. That sentence should terrify you.

Three options exist, in increasing isolation order:

Sandbox isolation tier comparison: containers with seccomp sharing a host kernel versus gVisor intercepting syscalls in user space versus Firecracker microVMs each with their own dedicated Linux kernel

Containers share a kernel. gVisor adds a user-space kernel layer. Firecracker gives every user their own kernel entirely. Each step right adds isolation and a small amount of overhead.

Containers with seccomp. A Docker container plus a strict seccomp profile (allowlisting only needed syscalls) stops most casual abuse. Boot time under 50ms, minimal memory overhead. But all containers share the host kernel. A kernel exploit compromises the host.

gVisor. Google's solution: intercept every syscall in user space via a component called the Sentry (written in Go). The application never talks to the real kernel directly. You get kernel-level isolation without a full VM. The tradeoff is 10-30% I/O overhead and moderate complexity. Used in Google Cloud Run.

MicroVMs (Firecracker). Each sandbox gets its own dedicated Linux kernel. Firecracker, AWS's open-source VMM written in Rust, boots a microVM in under 125ms with about 5MB of memory overhead per instance. This is what AWS Lambda uses. CodeSandbox runs over 150,000 new Firecracker microVMs per month and can resume a VM from a memory snapshot in under 500ms.

For a general-purpose online IDE at scale, Firecracker is the right answer. Container escapes are a real and recurring CVE class. Giving each user their own kernel makes that whole attack surface irrelevant. The 125ms cold-start cost is worth it, and pre-warmed pools eliminate most of it (covered in the bottlenecks section).

Resource limits per sandbox: 1 vCPU, 512MB RAM, 5GB ephemeral disk, 30-second execution timeout for one-shot runs, longer for persistent sessions. Apply cgroups to cap CPU and memory inside the VM.

The Terminal Needs Its Own Bridge

A persistent terminal (not just run-and-return output) requires a different plumbing path. The underlying technology here is ancient. PTYs (pseudo-terminals) were designed in the 1970s. They work spectacularly well in your browser in 2026. That's either impressive engineering or a sign that nobody has had a better idea. Probably both.

Terminal bridge diagram: browser xterm.js sends keypresses upstream over WebSocket to a gateway that writes to a PTY, and PTY output flows back downstream to xterm.js for rendering, with resize events triggering ioctl calls

Every single keypress makes this round trip. Window resize events trigger an ioctl call inside the VM. The load balancer must use sticky routing or this entire chain breaks.

Inside the microVM: spawn a shell, attach it to a PTY. The PTY gives you full interactive terminal semantics: control characters, terminal resize, ANSI escape codes.

On the server: a gateway process (Node.js or Go) creates the PTY inside the VM via something like node-pty, then bridges it to a WebSocket connection from the browser.

In the browser: xterm.js renders the VT100 output with pixel-perfect accuracy. Every keypress goes upstream over the WebSocket. Every byte the PTY produces comes back downstream and gets rendered.

When the user resizes the browser window, the frontend sends a resize message over the same WebSocket. The server calls ioctl(fd, TIOCSWINSZ, ...) on the PTY. Applications that query terminal size (vim, htop) see the correct dimensions immediately.

One WebSocket per active terminal session. Sessions are stateful. Your load balancer needs sticky routing so reconnects land on the right gateway process. A reconnect that lands on the wrong gateway sees no PTY and no shell. The user sees a broken terminal. Not great.

Autocomplete Runs Its Own Server

Autocomplete and inline error highlighting require a Language Server. Microsoft's Language Server Protocol (LSP), introduced with VS Code in 2016, standardizes the editor-server interface: the editor sends JSON-RPC 2.0 requests ("give me completions at line 14, col 7") and the server responds with structured results.

Each language has its own LSP server: pyright or pylsp for Python, clangd for C/C++, typescript-language-server for TypeScript. These are not lightweight. clangd on a real C++ project can consume 200-500MB of RAM. pyright runs around 100-200MB. Multiply that by 100K sessions and you've just invented the world's most expensive autocomplete feature.

The right model: one LSP server per workspace, spawned on workspace start, killed on workspace idle. The LSP server runs inside (or adjacent to) the microVM so it has access to the actual project files and installed packages. The gateway proxies the JSON-RPC messages between Monaco and the LSP server over a named pipe or local socket.

Do not share LSP servers across users. The server indexes your files and holds state about your project. Sharing it would leak code between users and produce wrong completions. The per-workspace cost is real, which is why you only start the LSP server on demand and kill it after 5 minutes of no editor activity.

Files Persist. Workspaces Don't.

Projects must survive browser close. But you cannot keep microVMs running for every project indefinitely. The solution is to treat files and compute as completely separate layers that only meet during an active session.

Data model:

projects (id, user_id, name, language, created_at)
files    (id, project_id, path, size, updated_at)
sessions (id, project_id, vm_id, status, last_active_at)

Files live in object storage (S3 or equivalent), keyed by project_id/path. The metadata DB (Postgres) tracks the file tree and session state.

Here's the full lifecycle, from "user clicks open" to "VM gets recycled":

Workspace lifecycle flowchart showing user opening a project, checking for a running session, claiming a pre-warmed VM from the pool, mounting files from S3, entering an active session with 30-second flush interval, and idle timeout triggering snapshot and VM release back to pool

The pre-warm pool is what makes the first second feel fast. Without it, every session open is a cold boot. With it, claiming a VM takes milliseconds.

30 seconds is the standard flush interval. Too frequent (every second) burns S3 PUT budget at scale. Too infrequent (every 5 minutes) loses changes on crash. Supplement with a save button for immediate flush and a session recovery log in Redis for the last N changes.

The Bottlenecks Your Interviewer Is Waiting For

State these proactively. Interviewers are not impressed by architectures that only work in ideal conditions.

Cold starts. A microVM booting from scratch takes 125ms at Firecracker's best, but image pull and file sync can push total workspace start to 3-5 seconds cold. Fix: pre-warm a pool of idle VMs by language (Python-ready, Node-ready, etc.). Store VM memory snapshots so common states can resume in 500ms instead of booting fresh.

LSP memory pressure. At 100K concurrent sessions with LSP servers running, you're looking at 10-50TB of RAM if LSP is always on. Fix: start LSP servers lazily (first time the user triggers a completion), kill after 5 minutes of no editor activity, and limit LSP server max RAM with cgroups.

File sync fan-out. 100K active sessions each flushing every 30 seconds is roughly 3,300 S3 PUTs per second. That's fine for S3, but debounce and batch per-file to avoid redundant writes on files that changed multiple times within the window.

WebSocket connection affinity. Terminal sessions are stateful. When you horizontally scale the gateway tier, a reconnect must land on the right process (or you need to re-establish the PTY). Options: sticky sessions at the load balancer (L4 or L7), or a forwarding proxy layer that knows which gateway holds which session. The forwarding proxy is cleaner for production because it survives gateway restarts.

Execution resource abuse. Fork bombs, infinite loops, and memory exhaustion are standard attacks on any code execution service. Mitigations: cgroup limits for CPU and memory, wall-clock timeout enforced by the orchestrator externally, and rate limiting per user (max N concurrent sessions, max M executions per hour).

Decisions That Shape the Architecture

Decision	Option A	Option B	Lean toward
Sandbox isolation	Containers + seccomp	Firecracker microVMs	MicroVMs for untrusted code
Frontend editor	Monaco (~5-10MB)	CodeMirror 6 (~300KB)	Monaco if IDE-grade; CodeMirror if mobile/lightweight
Execution model	Run-and-return (stateless)	Persistent terminal (stateful)	Depends on product; stateful needs WebSocket affinity
Collaboration sync	OT (server-ordered)	CRDT/Yjs (decentralized)	Yjs for simpler server, OT for Google Docs-style control
LSP scope	Per-workspace	Shared across users	Per-workspace (isolation + correctness)
In-browser execution	WebContainers (StackBlitz)	Server-side VM	Server-side for arbitrary languages; WebContainers for Node.js only

The 45-Minute Clock

0-5 min: Clarify requirements. Single vs collaborative, execution model, languages, persistence, scale target.
5-15 min: Five-box architecture. Browser + editor, API gateway, workspace orchestrator, file service, LSP service. Draw the diagram, name the components.
15-22 min: Sandbox deep-dive. Container vs gVisor vs Firecracker. Name the tradeoff (isolation depth vs boot time vs cost). Pick Firecracker with a pre-warm pool. Cover cgroup resource limits.
22-30 min: Terminal bridge (xterm.js, WebSocket, PTY, sticky sessions) and LSP lifecycle (per-workspace, lazy start, idle kill).
30-37 min: File storage. S3 for files, Postgres for metadata. Workspace lifecycle: claim VM, mount files, flush on save, snapshot on idle, resume from snapshot. Name the 30-second flush interval and its tradeoff.
37-42 min: Bottlenecks. Cold start (pre-warm pool), LSP memory (lazy + idle kill), WebSocket affinity (sticky LB or forwarding proxy), execution abuse (cgroups + timeout + rate limits).
42-45 min: Tradeoffs. Pick two from the table above and defend them.

Five Things Every Online Code Editor System Design Needs

The sandbox is the design. Firecracker microVMs give each user their own kernel. That one decision eliminates an entire class of security vulnerabilities.
Files and compute are separate. Object storage persists projects. Ephemeral VMs run them. The orchestrator stitches them together at session start.
Cold start is the main UX bottleneck. Pre-warm pools and snapshot-based resume bring workspace start from 3+ seconds to under 500ms.
LSP servers are expensive. Lazy startup and idle-kill with cgroup memory caps keep the cost manageable.
Terminal sessions are stateful. Your load balancer routing model must account for WebSocket affinity from day one.

If you want to practice walking through designs like this out loud, with real-time feedback on how you're communicating tradeoffs (not just whether the architecture is correct), SpaceComplexity runs voice-based mock system design interviews with rubric-based scoring. The gap between knowing the answer and explaining it clearly under pressure is exactly what interviews test.