Database Indexing for System Design Interviews: The Full Guide

June 3, 202611 min read
interview-prepcareersystem-designalgorithms
Database Indexing for System Design Interviews: The Full Guide
TL;DR
  • B+ tree indexes stay 4 levels deep for billions of rows; leaf nodes link as a sorted list, making range queries a single sequential walk
  • The leftmost prefix rule means a composite index on (A, B, C) helps queries filtering on A or (A, B) but not B alone; put equality columns before range columns
  • A covering index with INCLUDE columns lets the database skip the table entirely on reads, cutting I/O by 2-3x at the cost of index size
  • Partial indexes filter to just the rows you actually query, making them 100x smaller than full-column indexes on low-cardinality columns with a dominant value
  • LSM trees (Cassandra, RocksDB) convert random writes to sequential appends, trading read performance for write throughput
  • Adding 5 or more indexes to a write-heavy table can make INSERTs 3-5x slower; event logs and audit tables should stay near index-free
  • In a system design interview, always name the query, name the index, state why it works, and state the write cost rather than just saying "add an index"

You have a users table with 100 million rows. A query filtering by email takes 2.1 seconds. You add one index. It takes 4 milliseconds. You feel like a wizard.

You are not a wizard. You just stopped doing the equivalent of reading an entire phonebook to find one name.

An index is a separate data structure that maps a column's values to the physical location of matching rows. Instead of reading the whole table, the database traverses the index and jumps directly to the rows it needs. Every index you add slows writes and consumes storage. Understanding when that tradeoff is worth it is what interviewers actually care about.

Without an Index, You're Reading Every Single Row

Imagine you're looking for one specific email address. No index. The database pulls out a chair, sits down, and reads row one. Not a match. Row two. Not a match. Row 2,847,293. Also not a match. And so on, forever, until it has touched every row in the table.

That is a full table scan. O(n) reads. For a 100 million row table, that means gigabytes of data spread across thousands of disk pages. All of it, touched, for one email address.

The query planner is not lazy. It genuinely does not know where your data is unless you tell it with an index.

Why B+ Trees Are Everywhere

Most database indexes are B+ trees. Not binary trees, not B-trees exactly, but B+ trees specifically. The name is not as exciting as the structure.

In a B+ tree, all actual data pointers live in the leaf nodes. Internal nodes hold only routing keys. The routing keys are like signposts: go left if your value is less than 30, go right if it is greater. Leaf nodes are linked together in a sorted doubly-linked list.

B+ tree structure: internal nodes route, leaf nodes hold row pointers, leaves form a linked list

This design does two things exceptionally well. Range queries are cheap: once you find the start of a range, you just walk the linked list without re-traversing the tree from the root. And the tree stays shallow. With a branching factor of around 400 for integer keys, a tree 4 levels deep can index 400^4 = 25 billion rows.

For a 10-million-row table, your B+ tree is probably 4 levels deep. That means 4 page reads to find any row, regardless of table size. A sequential scan on the same table would read thousands of pages. Four reads versus thousands. That is where the 2.1 seconds became 4 milliseconds.

For more on B+ tree internals, see [/blog/b-tree-and-b-plus-tree].

Four Index Types, One Conversation

B-tree (the default)

Use for equality and range queries on most data types. When you write CREATE INDEX without specifying a type, you get a B-tree. It handles =, <, >, BETWEEN, LIKE 'prefix%', and ORDER BY. If you only remember one index type for your interview, remember B-tree.

Hash index

Use for exact equality only. Hash indexes cannot serve range queries because hash values have no ordering. No ordering means no "give me everything between January and March." Great for in-memory key-value lookups, terrible for anything sorted. In PostgreSQL, hash indexes have been crash-safe since version 10, which is how long it took for them to be safe enough to actually trust.

LSM-tree (Log-Structured Merge)

This is what Cassandra, RocksDB, and most write-heavy NoSQL stores use. Writes go to an in-memory buffer called a memtable and are periodically flushed as immutable sorted files called SSTables on disk. A read has to merge results across potentially several SSTables, which is slower. A write is always just appending to the memtable, which is very fast.

The core tradeoff: turn random writes into sequential writes, at the cost of slower reads.

In a system design interview, the choice between B-tree and LSM-tree maps directly to the choice between write-heavy and read-heavy storage engines. Write-heavy ingestion pipeline? LSM. Read-heavy analytics? B-tree. If you say this clearly, you will sound like someone who has actually thought about the storage layer instead of someone who has only heard the word "Cassandra."

See [/blog/key-value-store-system-design] for where this shows up in practice.

GIN (Generalized Inverted Index)

Use for full-text search and array containment queries. A GIN index inverts the relationship: instead of mapping rows to values, it maps tokens to the rows that contain them. Full-text search in PostgreSQL gets 100x faster with a GIN index. If your design includes search features and you are still suggesting a regular B-tree index on a description column, you have made an expensive mistake.

The Leftmost Prefix Rule

A composite index on (user_id, created_at) is not the same as two separate indexes. And it is not a magic index that helps every query involving either column.

The leftmost prefix rule: a composite index on (A, B, C) efficiently supports queries filtering on A, on (A, B), or on (A, B, C). It does not help queries that filter on B alone or C alone.

Why? Within the index, rows are sorted by A first, then B within each A group, then C within each B group. B is only sorted locally, not globally. Querying on just B without A is still a full scan of the index.

Index on (user_id, created_at):

user_id=1, created_at=2024-01-01  →  row ptr
user_id=1, created_at=2024-01-02  →  row ptr
user_id=1, created_at=2024-01-05  →  row ptr
user_id=2, created_at=2024-01-01  →  row ptr   ← created_at is NOT globally sorted
user_id=2, created_at=2024-01-03  →  row ptr

For column ordering in a composite index: equality conditions first, then range conditions, then sort columns.

If your query is WHERE user_id = ? AND status = ? ORDER BY created_at DESC, the index should be (user_id, status, created_at). Both equality columns come before the sort column. Reversing that order means the index exists but does not actually get used for sorting.

For selectivity: if one column filters down to 0.1% of rows and another to 40%, put the high-selectivity column first in most cases. An equality filter on a medium-selectivity column beats a range filter on a high-selectivity column. When in doubt, check EXPLAIN ANALYZE and see what the planner actually does.

Covering Indexes Skip the Table Entirely

A normal index lookup is two-step: find the row pointers in the index, then fetch the actual rows from the heap. Two round trips. Two sets of I/O.

A covering index contains every column the query touches. The database satisfies the entire query from the index without ever going back to the actual rows. In PostgreSQL, you add non-indexed columns using INCLUDE:

CREATE INDEX idx_orders_user ON orders (user_id, created_at) INCLUDE (status, total_amount);

A query for SELECT status, total_amount FROM orders WHERE user_id = ? ORDER BY created_at DESC now runs as an index-only scan. Benchmarks show 2 to 3x throughput improvement on read-heavy workloads. All because you stopped making the database fetch the same rows twice.

The downside is index size. Each included column inflates the index, which means slower writes and more storage. Add INCLUDE columns when you can prove heap access is the actual bottleneck, not as a pre-emptive optimization.

Partial Indexes Target the Rare Value

Here is a trap a lot of people fall into. You have an orders table. 99% of orders have status = 'completed'. 1% are 'pending'. You create an index on status because you query pending orders constantly.

Congratulations, you just built an index that covers 100 million rows to find 1 million of them. The query planner looks at the index, calculates that 99% of it is useless for your common query, and decides to just do a table scan anyway.

A partial index targets only the rows matching a condition:

CREATE INDEX idx_pending_orders ON orders (created_at) WHERE status = 'pending';

This index is 100x smaller. 100x faster to traverse. Most writes (completed orders) do not touch it at all because those rows do not match the WHERE clause.

Partial indexes are the right answer for low-cardinality columns where one value is disproportionately interesting to query. If you mention this unprompted in a system design interview, you will look like someone who has felt the pain of a bloated index before.

Every Extra Index Slows Your Writes

Here is the part that bites people. You add indexes to make reads fast. But reads are not the only thing happening to your table.

Every time a row is inserted, updated, or deleted, every index on that table must be updated to reflect the change. Add five indexes to a table and a simple INSERT now has to maintain six data structures.

Write cost multiplier grows significantly with each additional index

A table with five indexes can run INSERTs 3 to 5x slower than the same table with no indexes. The rough model:

  • 0 secondary indexes: 1x write cost
  • 1-2 secondary indexes: 1.5-2x write cost
  • 5+ secondary indexes: 3-5x write cost
  • Each index on a high-write table adds roughly +0.3 to +0.5x to that multiplier

This is why event logs, audit tables, time-series data, and message queue backing tables should have as few indexes as possible. You write far more than you read on those tables. Every index you add to an event log table is a tax you pay on every event, forever.

When to Skip the Index Entirely

Small tables. Under roughly 1,000 rows, a sequential scan is often faster. The entire table fits in a few memory pages. The overhead of traversing the B-tree and fetching individual row pointers actually exceeds the cost of just reading everything linearly. The query planner usually figures this out on its own, but now you can explain why.

Low-cardinality columns with no dominant value. A boolean is_active column where 70% of rows are true means any index on that column, when queried for true, touches 70% of the table. The planner ignores it. You spent storage and write overhead for nothing. Use a partial index targeting the rare value instead.

Write-heavy hot tables. If a table receives tens of thousands of inserts per second, minimize indexes. Read queries on that table should go through a cache or a read replica, not through fresh index scans on the same disk that is already being hammered with writes.

When the query returns more than 5-10% of the table. At some point, random I/O through an index becomes worse than sequential I/O through the whole table. The threshold is roughly 10% of leaf pages. The planner cost model handles this automatically, but understanding why it happens makes you sound like you have thought about storage hardware, not just SQL syntax.

How to Talk About Database Indexing in a System Design Interview

Most candidates mention indexes once and move on. "And I'd put an index on the user_id column." Done. Moving on.

That is not wrong. It is just not what strong candidates do.

Strong candidates treat indexing as a deliberate design decision with explicit tradeoffs. Here is the structure that works:

  1. Name the query. "The user feed endpoint does WHERE user_id = ? ORDER BY created_at DESC LIMIT 20."
  2. Name the index. "I'd put a composite B-tree index on (user_id, created_at DESC)."
  3. State why it works. "user_id is the equality filter and goes first. created_at is the sort column. The index already has rows pre-sorted by user_id and then by recency, so this query becomes a single range scan on the first few leaf pages."
  4. State the tradeoff. "Every new post or delete on this table updates the index. For a high-write table, I'd also consider whether this belongs in a separate read-optimized store with eventually consistent replication."
  5. Mention cardinality if relevant. "user_id has high selectivity. On a 500 million post table, this narrows to a few thousand rows for a typical user, well within B-tree scan territory."

The interviewer wants to see that you connect the query pattern to the index structure and can quantify both the benefit and the cost. Saying "I'll add an index" and then moving on tells them you know the vocabulary. Working through the tradeoffs tells them you actually understand the mechanism.

If the system involves search, mention GIN or Elasticsearch. If it has a write-heavy ingestion path, say those tables stay index-minimal and reads go through a cache. See [/blog/distributed-cache-system-design] for how that fits together.

For voice practice on these system design explanations, SpaceComplexity runs real-time mock interviews where you talk through designs and get scored on how clearly you communicate tradeoffs, not just whether you mentioned the right keywords.

Further Reading