Trie Data Structure: How Prefix Trees Make Autocomplete Fast

- The trie data structure encodes strings as root-to-node paths in a tree; the position of a node is its key, with no full string stored at any node
- Insert, search, and delete all run in O(m) where m is the key length, completely independent of how many strings are stored
- Prefix search runs in O(p + k) where p is the prefix length and k is the result count — no hash map or sorted array can match this for autocomplete queries
- Array vs. hash map children: fixed arrays win on cache locality for small alphabets (a-z); hash maps win on memory for Unicode or sparse character sets
- Word Search II pattern: build a trie from the word list, DFS the board once, and prune entire paths the moment no remaining word can start with your current prefix
- Production systems (Linux routing, Redis, Go's HTTP router) use radix trees or flat-array tries to avoid the heap fragmentation of millions of individual node allocations
Every time you type "py" into a search bar and see "python", "pygame", and "pytest" appear instantly, something is doing the matching. A hash map can't do it. Binary search can't do it efficiently. You could iterate over every word in the dictionary, but that's the interview answer that ends with "so what would you do differently?" The trie data structure is what makes prefix lookup fast, and once you understand it, you'll recognize its shape in a surprisingly wide range of problems.
A trie (pronounced "try," from retrieval, making it one of the few data structures actually named for what it does) is a tree where each path from root to node spells out a string. The node's position in the tree is the key. You never store the full key at a node; you build it by following the path.
The Memory Layout
Picture a tree. The root holds nothing. Each edge is labeled with a single character. A path from root to some node spells out a prefix. If a node is marked "end of word," that path is a complete word in the dictionary.

A trie storing "apple", "app", "art", and "banana". Filled nodes mark end-of-word. The path root → a → r → t spells "art" with no extra storage.
Every node contains exactly two things: an is-end-of-word flag and a collection of children. That's the whole data model. Children are indexed by character, so looking up the next node for character c is a single index operation, not a scan.
The Children Array vs. the Hash Map
That's where the real design tension lives. Two main approaches for storing children:
Fixed array (size = alphabet size):
children: [null, null, ..., ptr_to_p, ..., null] // index 15 = 'p'
For lowercase English, that's 26 slots. Each pointer is 8 bytes on a 64-bit machine. So every node burns 26 × 8 = 208 bytes for children alone, plus the isEnd flag and allocator overhead. Think of it as a 26-apartment building where 24 apartments are permanently vacant. You own all of them. The utilities still run.
Hash map (or dictionary):
children: {"p": node_for_p, "r": node_for_r}
Now you only allocate for characters that actually appear. If a node has 2 children, it costs proportionally to 2. The tradeoff: hash maps have higher constant-factor overhead per entry, and they shatter cache locality. The array has perfect cache locality because children[c - 'a'] is one arithmetic op and a pointer dereference. The hash map is one hash computation plus at least one memory hop.

Fixed array: 208 bytes per node, O(1) lookup, cache-friendly. Hash map: proportional memory, O(1) average, scattered in heap.
Practical rule of thumb: if your alphabet is small and dense (a-z, digits), use the array. If you're handling Unicode or arbitrary byte sequences, use the map.
For Unicode, the alphabet is up to 1,114,112 code points. A fixed array becomes absurd. Even a 256-slot array (full ASCII) costs 2 KB per node. Once your strings go multilingual, you have no choice but the map.
Core Operations
| Operation | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Insert | O(m) | O(m) worst case | m = key length; new nodes only created for missing prefix |
| Search | O(m) | O(1) | m = key length; just a traversal |
| Starts-with / prefix check | O(p) | O(1) | p = prefix length |
| Delete | O(m) | O(1) amortized | flag unset, pruning optional |
| Build from n words | O(n × m) | O(n × m) | total characters inserted |
| All words with prefix | O(p + k) | O(k) | k = number of results |
Why O(m)?
Every insert, search, and delete traverses exactly m nodes, one per character. There's no comparison against other keys, no binary search, no hashing. The cost depends only on the length of the word you're operating on. Time is a function of the key length, not the number of keys in the structure.
A hash map with a million entries does a key lookup in O(1) amortized, but it can't do prefix search at all without a full scan. A sorted array can do prefix search with binary search in O(log n + k), but n grows with the dictionary size. The trie pays a fixed O(m) regardless of whether you have ten words or ten million.
Insert in Detail
insert("apple"):
root -> create 'a' child
'a' -> create 'p' child
'p' -> create 'p' child
'p' -> create 'l' child
'l' -> create 'e' child
'e' -> set isEnd = true
insert("app"):
root -> follow 'a' (exists)
'a' -> follow 'p' (exists)
'p' -> follow 'p' (exists)
'p' -> set isEnd = true (only this new)
Shared prefixes mean no new nodes for the shared part. The entire "app" insert adds one flag change. That's prefix compression for free.

Inserting "app" into a trie with "apple": one flag changes, zero nodes allocated. The shared prefix already exists.
Delete in Detail
Deletion has three cases. Everyone gets at least one wrong the first time.
-
Word is a prefix of another word. ("app" is a prefix of "apple".) Just unset
isEnd. Don't touch the nodes. -
Word shares a prefix with another word. ("apple" and "apply" share "appl".) Unset
isEndon the divergence node, then prune the branch (the 'e' subtree in this case). -
Word is standalone. Every node from root to the leaf is exclusive to this word. Delete them all bottom-up.
You can implement this with a recursive DFS that returns whether the current node should be deleted, unwinding from the leaf toward the root.

The three delete cases. The recursive approach handles all three by returning "delete me" up the call stack.
What Problems It Solves
Tries dominate in any domain where strings share structure at their heads.
Prefix search and autocomplete. Build the trie from your dictionary. For any prefix query, traverse to the prefix node, then collect all words in its subtree. Time for the lookup is O(p + k), where p is the prefix length and k is the result count. No other structure gives you this cleanly.
Wildcard and pattern matching. The classic "Word Search II" (LeetCode 212) pattern: you have a board and a word list. Build a trie from the words, then DFS the board while simultaneously descending the trie. The trie lets you prune entire board paths the instant no word can start with your current prefix.
Spell checkers and suggestion. Walk the trie, and at any point where the input doesn't match, you're already positioned to find the nearest valid continuation.
IP routing (longest prefix match). Routers need to find the longest matching prefix in a routing table for each incoming packet. A trie (specifically a radix tree, the compressed variant) does this in O(32) for IPv4, O(128) for IPv6, independent of table size.
Dictionary validation. Does a given string exist in a set of strings? A trie answers in O(m), with no hash collisions possible.
Recognizing the Pattern
The signals that a problem calls for a trie:
- The problem involves a dictionary or word list and asks about prefixes, suffixes (use a reversed trie), or membership
- You need to search a 2D grid for multiple words simultaneously
- The brute-force solution iterates over all words for each query, and the words share common prefixes
- You see "autocomplete," "typeahead," "starts with," or "longest prefix" in the problem statement
Worked example 1: Implement an autocomplete system
Problem: Given a list of words and a prefix, return all words that start with that prefix.
Brute force: iterate all words, check word.startswith(prefix). O(n × m) per query.
Why a trie fits: insert all words once, O(total characters). Each query traverses exactly p nodes to reach the prefix node, then collects results from the subtree. Query time is O(p + k), not O(n × m). For a dictionary with 100,000 words and a two-character prefix that matches 5,000 words, the trie traverses 2 nodes to position, then 5,000 nodes to collect. The brute force touches 100,000 words for every query.

Autocomplete for "ap": two hops to reach the prefix node, then collect the entire subtree. The other 99,998 dictionary entries never get visited.
Worked example 2: Word Search II (LeetCode 212)
Problem: Given an m × n board of characters and a list of words, find all words that appear on the board (adjacent cells, no reuse).
Naive approach: for each word, do a DFS on the board. O(words × board × 4^word_length). With 100 words, this explodes.
Why a trie fits: build a single trie from all words. DFS the board once, tracking your position in the trie. When the current board path leads to a trie node with no children, you prune immediately. You abort entire DFS branches the moment no remaining word can match, which you couldn't do with individual word searches. All 100 words share the same single pass over the board.
Trie Variants Worth Knowing
Compressed trie (Radix tree / Patricia trie). Merge chains of single-child nodes into one edge labeled with a string. "apple" as a standalone word becomes a single edge from root to leaf. This eliminates the problem where sparse tries have long chains of one-child nodes. Radix trees power Linux kernel routing tables and many real-world IP lookup systems.
DAWG (Directed Acyclic Word Graph). Merge common suffixes too, not just prefixes. If "testing" and "resting" are both words, the "esting" suffix nodes become shared. Compresses further but cannot represent counts per word.
Ternary Search Trie. A hybrid: each node has three children (less than, equal, greater than) rather than a fixed array. Cache-friendlier than hash map tries and space-efficient for sparse alphabets.
Trie Data Structure Complexity
- Insert, search, delete: O(m) where m is key length. Independent of dictionary size.
- Space: O(n × m × σ) in the worst case for an array implementation, where σ is alphabet size. With hash map children, closer to O(total characters stored).
- Prefix enumeration: O(p + k) where p is prefix length and k is result count.
- The array-vs-map tradeoff: array wins on cache locality and constant factor for small dense alphabets; map wins on memory for large or sparse alphabets.
The Hidden Cost Nobody Talks About
The memory problem that trips people up when they first reach for a trie is allocation pressure. Byte count is the easy thing to measure. What actually hurts at scale is how many separate heap objects you're creating.
Every node is a separate heap object. A trie with 100,000 words might have 500,000 nodes. That's 500,000 individual allocations, 500,000 GC roots (in managed languages, your garbage collector would like to have a word with you), and a heap fragmented into 500,000 scattered objects. Cache misses cascade. Traversing root-to-leaf on a deep trie is a pointer-chasing exercise across potentially hundreds of thousands of scattered memory locations.
The array-backed C implementation above is dramatically more cache-friendly than any hash-map-backed implementation because the 26 child pointers all live in the same contiguous block. You pay the memory waste but gain locality.
For production systems where trie traversal is a hot path, flat array tries (packing the entire structure into a single array with index arithmetic instead of pointers) are common. Linux's routing uses a level-compressed trie packed into contiguous memory. Go's net/http router uses a radix tree. Redis uses a rax tree (radix tree). The vanilla textbook trie is a starting point, not an endpoint.
Implementations
class TrieNode: def __init__(self): self.children: dict[str, "TrieNode"] = {} self.is_end = False class Trie: def __init__(self): self.root = TrieNode() def insert(self, word: str) -> None: node = self.root for char in word: if char not in node.children: node.children[char] = TrieNode() node = node.children[char] node.is_end = True def search(self, word: str) -> bool: node = self.root for char in word: if char not in node.children: return False node = node.children[char] return node.is_end def starts_with(self, prefix: str) -> bool: node = self.root for char in prefix: if char not in node.children: return False node = node.children[char] return True def words_with_prefix(self, prefix: str) -> list[str]: node = self.root for char in prefix: if char not in node.children: return [] node = node.children[char] results: list[str] = [] self._collect(node, prefix, results) return results def _collect(self, node: TrieNode, path: str, results: list[str]) -> None: if node.is_end: results.append(path) for char, child in node.children.items(): self._collect(child, path + char, results)
Recap
- A trie stores strings by encoding each character as an edge in a tree. The path from root to node is the key.
- Insert, search, and delete cost O(m) where m is the key length, independent of how many strings are stored.
- Prefix search costs O(p + k), where p is the prefix length and k is the result count. No other common structure matches this.
- Children can be stored as a fixed array (fast, cache-friendly, wastes memory for sparse alphabets) or a hash map (memory-efficient, scattered in memory).
- The memory cost is real: one heap allocation per node means fragmentation and cache pressure at scale.
- Reach for a trie when problems involve prefix matching, wildcard search across a word list, or any time you see "starts with" in the problem statement.
If you want to practice identifying when to use a trie versus when a hash map is actually fine, SpaceComplexity puts you in a real interview format where you have to defend your data structure choice out loud. Knowing the data structure is one thing. Explaining why you chose it, live, with someone timing you, is a different skill entirely.
For more on how data structure choice affects your overall problem-solving approach, see our guides on communicating your approach under pressure and recognizing algorithm patterns from problem signals.