Trie Data Structure: How Prefix Trees Make Autocomplete Fast

Every time you type "py" into a search bar and see "python", "pygame", and "pytest" appear instantly, something is doing the matching. A hash map can't do it. Binary search can't do it efficiently. You could iterate over every word in the dictionary, but that's the interview answer that ends with "so what would you do differently?" The trie data structure is what makes prefix lookup fast, and once you understand it, you'll recognize its shape in a surprisingly wide range of problems.

A trie (pronounced "try," from retrieval, making it one of the few data structures actually named for what it does) is a tree where each path from root to node spells out a string. The node's position in the tree is the key. You never store the full key at a node; you build it by following the path.

What's Actually in a Trie Node?

Picture a tree. The root holds nothing. Each edge is labeled with a single character. A path from root to some node spells out a prefix. If a node is marked "end of word," that path is a complete word in the dictionary.

Trie storing the words apple, app, art, and banana. Blue filled circles mark end-of-word nodes.

A trie storing "apple", "app", "art", and "banana". Filled nodes mark end-of-word. The path root → a → r → t spells "art" with no extra storage.

Every node contains exactly two things: an is-end-of-word flag and a collection of children. That's the whole data model. Children are indexed by character, so looking up the next node for character c is a single index operation, not a scan.

The Children Array vs. the Hash Map

That's where the real design tension lives. Two main approaches for storing children:

Fixed array (size = alphabet size):

children: [null, null, ..., ptr_to_p, ..., null]  // index 15 = 'p'

For lowercase English, that's 26 slots. Each pointer is 8 bytes on a 64-bit machine. So every node burns 26 × 8 = 208 bytes for children alone, plus the isEnd flag and allocator overhead. Think of it as a 26-apartment building where 24 apartments are permanently vacant. You own all of them. The utilities still run.

Hash map (or dictionary):

children: {"p": node_for_p, "r": node_for_r}

Now you only allocate for characters that actually appear. If a node has 2 children, it costs proportionally to 2. The tradeoff: hash maps have higher constant-factor overhead per entry, and they shatter cache locality. The array has perfect cache locality because children[c - 'a'] is one arithmetic op and a pointer dereference. The hash map is one hash computation plus at least one memory hop.

Side-by-side comparison of fixed array children (26 slots, mostly null) versus hash map children (only existing characters stored).

Fixed array: 208 bytes per node, O(1) lookup, cache-friendly. Hash map: proportional memory, O(1) average, scattered in heap.

Practical rule of thumb: if your alphabet is small and dense (a-z, digits), use the array. If you're handling Unicode or arbitrary byte sequences, use the map.

For Unicode, the alphabet is up to 1,114,112 code points. A fixed array becomes absurd. Even a 256-slot array (full ASCII) costs 2 KB per node. Once your strings go multilingual, you have no choice but the map.

Why Every Operation Costs O(m)

Insert, search, and delete all walk exactly m nodes for a key of length m. The dictionary size never enters the cost. That single fact is why the trie exists.

Operation	Time Complexity	Space Complexity	Notes
Insert	O(m)	O(m) worst case	m = key length; new nodes only created for missing prefix
Search	O(m)	O(1)	m = key length; just a traversal
Starts-with / prefix check	O(p)	O(1)	p = prefix length
Delete	O(m)	O(m) aux	recursive prune unwinds the stack m deep
Build from n words	O(n × m)	O(n × m)	total characters inserted
All words with prefix	O(p + k)	O(k)	k = number of results

Why O(m)?

Every insert, search, and delete traverses exactly m nodes, one per character. There's no comparison against other keys, no binary search, no hashing. The cost depends only on the length of the word you're operating on. Time is a function of the key length, not the number of keys in the structure.

A hash map with a million entries does a key lookup in O(1) amortized, but it can't do prefix search at all without a full scan. A sorted array can do prefix search with binary search in O(log n + k), but n grows with the dictionary size. The trie pays a fixed O(m) regardless of whether you have ten words or ten million.

Insert in Detail

insert("apple"):
  root -> create 'a' child
  'a'  -> create 'p' child
  'p'  -> create 'p' child
  'p'  -> create 'l' child
  'l'  -> create 'e' child
  'e'  -> set isEnd = true

insert("app"):
  root -> follow 'a' (exists)
  'a'  -> follow 'p' (exists)
  'p'  -> follow 'p' (exists)
  'p'  -> set isEnd = true   (only this new)

Shared prefixes mean no new nodes for the shared part. The entire "app" insert adds one flag change. That's prefix compression for free.

Before and after inserting the word app into a trie that already contains apple. The second p node flips from isEnd false to isEnd true with no new nodes created.

Inserting "app" into a trie with "apple": one flag changes, zero nodes allocated. The shared prefix already exists.

Delete in Detail

Deletion has three cases. Everyone gets at least one wrong the first time.

Word is a prefix of another word. ("app" is a prefix of "apple".) Just unset isEnd. Don't touch the nodes.
Word shares a prefix with another word. ("apple" and "apply" share "appl".) Unset isEnd on the divergence node, then prune the branch (the 'e' subtree in this case).
Word is standalone. Every node from root to the leaf is exclusive to this word. Delete them all bottom-up.

You can implement this with a recursive DFS that returns whether the current node should be deleted, unwinding from the leaf toward the root.

Three delete cases: word is prefix of another (unset flag only), word shares prefix with another (prune one branch), and word is standalone (delete all nodes bottom-up).

The three delete cases. The recursive approach handles all three by returning "delete me" up the call stack.

When Should You Reach for a Trie?

The signal is a shared prefix. If your strings share their heads, every other structure pays for that overlap. The trie collapses it into one path.

Prefix search and autocomplete. Build the trie from your dictionary. For any prefix query, traverse to the prefix node, then collect all words in its subtree. Time for the lookup is O(p + k), where p is the prefix length and k is the result count. No other structure gives you this cleanly.

Wildcard and pattern matching. The classic Word Search II (LeetCode 212) pattern: you have a board and a word list. Build a trie from the words, then DFS the board while simultaneously descending the trie. The trie lets you prune entire board paths the instant no word can start with your current prefix.

Spell checkers and suggestion. Walk the trie, and at any point where the input doesn't match, you're already positioned to find the nearest valid continuation.

IP routing (longest prefix match). Routers need to find the longest matching prefix in a routing table for each incoming packet. An uncompressed binary trie does this in O(32) for IPv4 and O(128) for IPv6, independent of table size. The compressed variants (radix and LC-trie) cut that constant down further by skipping single-child chains.

Dictionary validation. Does a given string exist in a set of strings? A trie answers in O(m), with no hash collisions possible.

Recognizing the Pattern

If the brute force iterates every word per query and the words share prefixes, a trie collapses that work into a single tree walk. Watch for these signals:

The problem involves a dictionary or word list and asks about prefixes, suffixes (use a reversed trie), or membership
You need to search a 2D grid for multiple words simultaneously
The brute-force solution iterates over all words for each query, and the words share common prefixes
You see "autocomplete," "typeahead," "starts with," or "longest prefix" in the problem statement

Worked example 1: Implement an autocomplete system

Problem: Given a list of words and a prefix, return all words that start with that prefix.

Brute force: iterate all words, check word.startswith(prefix). O(n × m) per query.

Why a trie fits: insert all words once, O(total characters). Each query traverses exactly p nodes to reach the prefix node, then collects results from the subtree. Query time is O(p + k), not O(n × m). For a dictionary with 100,000 words and a two-character prefix that matches 5,000 words, the trie traverses 2 nodes to position, then 5,000 nodes to collect. The brute force touches 100,000 words for every query.

Autocomplete for "ap": two hops to reach the prefix node, then collect the entire subtree. The other 99,998 dictionary entries never get visited.

Worked example 2: Word Search II (LeetCode 212)

Problem: Given an m × n board of characters and a list of words, find all words that appear on the board (adjacent cells, no reuse).

Naive approach: for each word, do a DFS on the board. O(words × board × 4^word_length). With 100 words, this explodes.

Why a trie fits: build a single trie from all words. DFS the board once, tracking your position in the trie. When the current board path leads to a trie node with no children, you prune immediately. You abort entire DFS branches the moment no remaining word can match, which you couldn't do with individual word searches. All 100 words share the same single pass over the board.

The expert optimization on top: once you've added a word to the answer set, delete it from the trie. Each removed leaf can collapse its parent chain, so future DFS branches prune even earlier. This turns the worst case from "DFS the whole board once" into "DFS the board, but the trie shrinks under you." It's the trick that gets the LeetCode submission under the time limit when naive trie pruning still times out.

When the Vanilla Trie Isn't Enough

The textbook trie is a starting point. Production systems almost always reach for a compressed variant.

Compressed trie (Radix tree / Patricia trie). Merge chains of single-child nodes into one edge labeled with a string. "apple" as a standalone word becomes a single edge from root to leaf. This eliminates the problem where sparse tries have long chains of one-child nodes. The structure was invented in 1968 by Donald R. Morrison as PATRICIA (Practical Algorithm to Retrieve Information Coded in Alphanumeric), and radix trees still power most real-world IP lookup systems.

DAWG (Directed Acyclic Word Graph). Merge common suffixes too, not just prefixes. If "testing" and "resting" are both words, the "esting" suffix nodes become shared. Compresses further but cannot represent counts per word.

Ternary Search Trie. A hybrid: each node has three children (less than, equal, greater than) rather than a fixed array. Cache-friendlier than hash map tries and space-efficient for sparse alphabets. See Sedgewick's reference implementation for the full treatment.

Trie Data Structure Complexity

Insert, search, delete: O(m) where m is key length. Independent of dictionary size.
Space: O(n × m × σ) in the worst case for an array implementation, where σ is alphabet size. With hash map children, closer to O(total characters stored).
Prefix enumeration: O(p + k) where p is prefix length and k is result count.
The array-vs-map tradeoff: array wins on cache locality and constant factor for small dense alphabets; map wins on memory for large or sparse alphabets.

The Hidden Cost Nobody Talks About

The memory problem that trips people up when they first reach for a trie is allocation pressure. Byte count is the easy thing to measure. What actually hurts at scale is how many separate heap objects you're creating.

Every node is a separate heap object. A trie with 100,000 words might have 500,000 nodes. That's 500,000 individual allocations, 500,000 GC roots (in managed languages, your garbage collector would like to have a word with you), and a heap fragmented into 500,000 scattered objects. Cache misses cascade. Traversing root-to-leaf on a deep trie is a pointer-chasing exercise across potentially hundreds of thousands of scattered memory locations.

The array-backed C implementation above is dramatically more cache-friendly than any hash-map-backed implementation because the 26 child pointers all live in the same contiguous block. You pay the memory waste but gain locality.

For production systems where trie traversal is a hot path, flat array tries (packing the entire structure into a single array with index arithmetic instead of pointers) are common. Linux's IPv4 routing uses a level-compressed trie in net/ipv4/fib_trie.c, based on the Nilsson & Karlsson 1999 LC-trie paper. Redis uses rax, antirez's radix tree. Go's standard net/http mux is pattern-matched, not a trie, but httprouter (and most third-party routers like chi and gin) explicitly use a radix tree for path routing. The vanilla textbook trie is a starting point, not an endpoint.

One Structure, Every Language

class TrieNode:
    def __init__(self):
        self.children: dict[str, "TrieNode"] = {}
        self.is_end = False

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word: str) -> None:
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end = True

    def search(self, word: str) -> bool:
        node = self.root
        for char in word:
            if char not in node.children:
                return False
            node = node.children[char]
        return node.is_end

    def starts_with(self, prefix: str) -> bool:
        node = self.root
        for char in prefix:
            if char not in node.children:
                return False
            node = node.children[char]
        return True

    def words_with_prefix(self, prefix: str) -> list[str]:
        node = self.root
        for char in prefix:
            if char not in node.children:
                return []
            node = node.children[char]
        results: list[str] = []
        self._collect(node, prefix, results)
        return results

    def _collect(self, node: TrieNode, path: str, results: list[str]) -> None:
        if node.is_end:
            results.append(path)
        for char, child in node.children.items():
            self._collect(child, path + char, results)

Cheat Sheet

Question	Answer
What does a node store?	An `isEnd` flag and a children collection
Cost of insert / search / delete	O(m) where m = key length, independent of dictionary size
Cost of prefix enumeration	O(p + k) for prefix length p and k results
Array children vs. hash map children	Array: faster, cache-friendly, 208 B/node wasted. Map: only pays for chars present, scattered in heap
Hidden cost at scale	One heap allocation per node → fragmentation, GC pressure, pointer-chasing cache misses
When to reach for it	Prefix matching, autocomplete, multi-word grid search, longest-prefix lookup
When to skip it	Single key lookups with no prefix structure (a hash map is simpler and uses less memory)

If you want to practice identifying when to use a trie versus when a hash map is actually fine, SpaceComplexity puts you in a real interview format where you have to defend your data structure choice out loud. Knowing the data structure is one thing. Explaining why you chose it, live, with someone timing you, is a different skill entirely.

For more on how data structure choice affects your overall problem-solving approach, see our guides on communicating your approach under pressure and recognizing algorithm patterns from problem signals.