Radix Tree Data Structure: The Compressed Trie Behind URL Routing and IP Lookup

- Compression invariant: every internal non-terminal node has at least two children, dropping node count from O(n×k) to O(n) regardless of key length.
- 2n-1 node bound: proved by partitioning nodes into leaves, internal terminals, and internal non-terminals using a child-counting argument.
- All operations are O(k): characters are never re-examined; the edge split on insert is O(1) structural pointer work after O(k) traversal.
- Children keyed by first character: O(1) child dispatch regardless of edge label length, with full label comparison following.
- Production deployments: httprouter, Gin, Echo (URL routing), nginx (IP matching), Linux page cache, Redis Streams (antirez/rax).
- Reach for it when: you need prefix enumeration, longest-prefix-match, or a standard trie's O(n×k) node count is unacceptable.
Your standard trie works. O(k) per operation, prefix queries a hash map cannot touch, everyone goes home happy. Except for one problem: ten thousand keys averaging twenty characters each might give you two hundred thousand nodes. Most of them are single-child chains that exist only to spell out shared prefixes one character at a time. Each one holds one character, points to the next node, and does absolutely nothing else. A corridor that leads to a corridor that leads to a corridor.
The radix tree data structure (also called a compressed trie or PATRICIA trie) fixes that with one rule: collapse every chain of single-child nodes into a single edge labeled with a multi-character string. Same O(k) speed. Node count drops to O(n) regardless of key length. That is why Gin, Echo, and httprouter use radix trees for HTTP routing, why nginx uses one for IP matching, and why the Linux kernel has used one for its page cache since the 2.5 series.
Reach for a radix tree when you need prefix queries and the keys share long common prefixes, when you need longest-prefix-match (URL routing, IP subnets), or when a standard trie's memory footprint is unacceptable.
From Trie to Radix Tree: The One Compression Rule
Take the words "apple", "apply", "application", and "apt". Build a standard trie. You get a chain a-p that every key must traverse, then another p that three of the four keys traverse, then a-p-p-l that three keys traverse before branching. The trie has twenty-seven nodes. None of the interior chain nodes do any branching work.
Now apply the compression rule. Any node that has exactly one child and is not itself a stored key gets merged with that child. The edge between them becomes the concatenation of both labels. Repeat until no such node exists.
The result for those four keys:
root
└── "ap" ──► node
├── "pl" ──► node
│ ├── "e" (is_end: apple)
│ ├── "ication" (is_end: application)
│ └── "y" (is_end: apply)
└── "t" (is_end: apt)
Seven nodes instead of twenty-seven. That is the entire sell. The compression invariant: every internal, non-terminal node has at least two children. A node that is both a stored key and a branching point can have one or more children. But a node that is not a stored key must branch to at least two children, or it gets merged with its only child.
Left: standard trie. Single-character nodes all the way down, most of them doing nothing but pointing forward. Right: radix tree with 7 nodes and multi-character edge labels.
What a Radix Node Stores
A radix node stores exactly three things:
RadixNode:
edge_label : string // the label on the incoming edge from parent
children : Map[char → RadixNode] // keyed by first char of child's edge
is_end : bool // true if this node is a stored key
The children map is indexed by the first character of each child's edge label, giving O(1) child lookup regardless of edge label length. You take key[i], look up that character in the children map, then compare the full edge label against key[i..i+len-1].
Storing the edge label in the child node (rather than in the parent's map value) makes the node self-describing and simplifies iteration. The root node always has an empty edge label.
The children map is keyed by the first character of each child's edge label. O(1) branch selection, then one linear scan of the edge to confirm the match.
Why You Never Have More Than 2n-1 Nodes
This is the proof that turns "a compressed trie" from a vibes-based description into a guarantee with a number attached.
Claim: a radix tree storing n keys has at most 2n-1 nodes total.
Proof. Partition nodes into three types: leaves L (no children; since any childless non-terminal node gets merged upward, every leaf is a stored key), internal terminal nodes I_t (is_end=true, at least one child), and internal non-terminal nodes I_nt (is_end=false, at least two children by the compression invariant). Every stored key is a leaf or an internal terminal, so I_t ≤ n − L.
Each non-root node is a child of exactly one parent. Count total children:
L + I_t + I_nt − 1 ≥ I_t + 2·I_nt (each I_t gives ≥1 child, each I_nt gives ≥2)
L − 1 ≥ I_nt
Total nodes = L + I_t + I_nt ≤ L + (n − L) + (L − 1) = n + L − 1 ≤ 2n − 1.
A standard trie uses O(n × k) nodes in the worst case. n strings, no shared prefixes, each character its own node. A radix tree always uses O(n) nodes, independent of key length. The edge label strings cost O(total input characters), which is data you were storing anyway.
Every Operation Is O(k)
| Operation | Time | Extra Space |
|---|---|---|
| Search | O(k) | O(1) |
| Insert | O(k) | O(k) |
| Delete | O(k) | O(1) |
| Prefix enumeration | O(|prefix| + output) | O(depth) |
| Build from n keys | O(n × k) | O(n × k) |
k = length of the key being operated on. depth = tree height ≤ k.
Search Is O(k) Because Characters Are Never Re-examined
Start at root, depth=0. At each step: look up key[depth] in the current node's children map (O(1)). Compare the child's edge label against key[depth..depth+len-1] character by character. If it matches, advance depth by len and move to the child. If it mismatches, return false.
The total characters compared across the entire path sum to exactly k. Each character is compared at most once. You never backtrack. The lookup in each node's children map is O(1) (hash map or array). Total: O(k).
A standard trie search is also O(k), but each comparison costs exactly one character. The radix tree takes multi-character strides, touching fewer nodes and causing fewer cache misses on real key distributions. Same big-O. The constant is not the same.
Insert Is O(k) with an O(1) Edge Split
Walk the tree exactly like search until one of three cases:
Case 1: Key runs out at a node boundary. Mark that node is_end = true. Done.
Case 2: No child for the next character. Add a new leaf with the remaining key as its edge label. Done.
Case 3: The new key and an existing edge label share a common prefix of length j, then diverge. Say the edge label is "application" and you are inserting "apple". They share "appl" (j=4), then 'i' vs 'e'. You:
- Create a new node
splitwith edge labellabel[:j]("appl") - Update the old child's edge label to
label[j:]("ication") - Add the old child as
split.children[label[j]](keyed by 'i') - Add a new leaf for the remaining new key
key[depth+j:]("e") assplit.children[key[depth+j]] - Replace the old child in the current node with
split
The structural change in case 3 is O(1) pointer operations. The O(k) cost is the traversal to find the split point.
The structural change is O(1): create split node, update two pointers, add one new leaf. The O(k) cost is the traversal to find the divergence point.
Delete Is O(k) with Optional Recompression
Walk to the terminal node (O(k)) tracking the parent-child path. Unmark is_end. Then:
- If the node has no children, remove it from its parent. Walk up: if the parent is now a childless non-terminal, remove it too. Repeat.
- If after pruning, a node is a non-terminal with exactly one child, merge it with that child (concatenate edge labels). This restores the compression invariant.
Step 1 is mandatory for correctness. A childless non-terminal node is unreachable and has no reason to exist. Step 2 is optional. Skipping it is technically correct but gives you a tree that gradually loses compression after heavy deletion. Fine for write-heavy then read-only workloads. Not great otherwise.
Prefix Enumeration Is O(|prefix| + output)
Walk to the subtree root that represents the end of the prefix (same as search, but stop when the prefix is consumed). Then collect all is_end nodes in that subtree via DFS. The DFS visits exactly the nodes in the result set. Cost: O(|prefix|) to locate, O(output) to collect.
One Structure, Every Language
The node stores edge_label (incoming edge string), children (map keyed by first character), and is_end. All implementations below provide insert, search, and prefix enumeration. The Python 3 version also shows delete in full.
class RadixNode: __slots__ = ("edge_label", "children", "is_end") def __init__(self, edge_label: str = "") -> None: self.edge_label = edge_label self.children: dict[str, "RadixNode"] = {} self.is_end = False class RadixTree: def __init__(self) -> None: self.root = RadixNode() def insert(self, key: str) -> None: node = self.root depth = 0 while depth < len(key): ch = key[depth] if ch not in node.children: leaf = RadixNode(key[depth:]) leaf.is_end = True node.children[ch] = leaf return child = node.children[ch] label = child.edge_label j = 0 while j < len(label) and depth + j < len(key) and label[j] == key[depth + j]: j += 1 if j == len(label): depth += j node = child continue # Split edge at position j split = RadixNode(label[:j]) child.edge_label = label[j:] split.children[label[j]] = child node.children[ch] = split if depth + j < len(key): leaf = RadixNode(key[depth + j:]) leaf.is_end = True split.children[key[depth + j]] = leaf else: split.is_end = True return node.is_end = True def search(self, key: str) -> bool: node = self.root depth = 0 while depth < len(key): ch = key[depth] if ch not in node.children: return False child = node.children[ch] label = child.edge_label if key[depth : depth + len(label)] != label: return False depth += len(label) node = child return node.is_end def starts_with(self, prefix: str) -> list[str]: node = self.root depth = 0 while depth < len(prefix): ch = prefix[depth] if ch not in node.children: return [] child = node.children[ch] label = child.edge_label j = 0 while j < len(label) and depth + j < len(prefix) and label[j] == prefix[depth + j]: j += 1 if depth + j == len(prefix): results: list[str] = [] self._collect(child, prefix[:depth] + label, results) return results if j < len(label): return [] depth += len(label) node = child results = [] self._collect(node, prefix, results) return results def _collect(self, node: RadixNode, acc: str, results: list[str]) -> None: if node.is_end: results.append(acc) for child in node.children.values(): self._collect(child, acc + child.edge_label, results) def delete(self, key: str) -> bool: path: list[tuple[RadixNode, str]] = [] node = self.root depth = 0 while depth < len(key): ch = key[depth] if ch not in node.children: return False child = node.children[ch] label = child.edge_label if key[depth : depth + len(label)] != label: return False path.append((node, ch)) depth += len(label) node = child if not node.is_end: return False node.is_end = False # Prune childless non-terminal nodes upward while path and not node.children and not node.is_end: parent, ch = path.pop() del parent.children[ch] node = parent # Merge node with its only child if it is now non-terminal and single-child if path and not node.is_end and len(node.children) == 1: (only_ch, only_child), = node.children.items() node.edge_label += only_child.edge_label node.children = only_child.children node.is_end = only_child.is_end return True
Where Radix Trees Ship in Production
URL routing with path parameters. An HTTP router must map /users/42/posts/7 to a handler function. A hash map fails immediately: the actual URL was never inserted, only the pattern /users/:id/posts/:postId. A radix tree walks the path segments, recognizing literal characters ('/', 'u', 's', ...) until it hits a node whose child starts with ':' or '*', then switches into parameter-capture mode. Gin and Echo register hundreds of routes and resolve any incoming URL in O(|path|) with zero allocations.
IP longest-prefix-match routing. Network routers must find the most specific matching subnet for each packet's destination address. For IP, that means the longest prefix of the destination that appears in the routing table. A radix tree (at the bit level: each node branches on one bit of the IP address) walks the address from most-significant to least-significant bit, tracking the last match seen. When the walk ends, the last match is the longest prefix. Nginx uses this for IP access control; hardware routers use ternary CAM (essentially hardware radix trees) for line-rate routing.
Autocomplete and dictionary prefix search. Any system that suggests completions for a partial word uses some form of prefix index. A hash map cannot answer "give me all words starting with 'appl'". You would have to scan everything. A sorted array can, with O(log n + output) binary search. A radix tree answers in O(|prefix| + output), and for large key sets with shared prefixes (a dictionary, a domain name catalog, a code identifier index), uses far less memory than a standard trie.
DNS resolution. Domain names are traversed in reverse (com → example → www) for zone matching. Each label is a segment. Radix trees over domain name labels let resolvers find the deepest authoritative zone for any query in O(|name|).
Key-value store namespacing. Redis Streams uses a radix tree (antirez wrote rax himself) to store consumer group state keyed by stream IDs. Package managers use them to find all packages in a namespace (e.g., all packages starting with @babel/).
Five Signals to Reach for a Radix Tree
The unifying pattern: you need to match partial keys against a stored set, not whole keys. Hash maps and exact-match indexes don't help here.
- The problem says "find all keys with prefix X" and a hash set is not enough.
- You need longest-prefix-match: given a query, find the stored key that is the longest prefix of it.
- The keys share substantial common prefixes and a standard trie's memory is a concern.
- You need ordered enumeration within a namespace (all keys from "aa" to "az" in alphabetical order).
- You need wildcard or parameterized matching along a hierarchical path.
Worked example 1: autocomplete system.
Problem: given a dictionary of n words, answer autocomplete queries. Each query gives a prefix p and wants all words starting with p.
Brute force: scan all n words and check word.startswith(p). O(n × |p|) per query.
Hash set: no help. You cannot enumerate all strings with a given prefix from a hash set without scanning everything.
Sorted array + binary search: binary search for the first word ≥ p and the first word ≥ p+1 (the next prefix), then collect. O(log n + output). Reasonable, but building the prefix boundary requires careful comparison logic and the structure does not support insertions efficiently.
Radix tree: walk p through the tree in O(|p|). If no node for p exists, return empty. If a node exists, DFS from there to collect all is_end nodes. O(|p| + output). Insertions and deletions are O(|word|) without rebalancing. The structure is correct by construction: the tree encodes exactly the set of stored words.
Worked example 2: HTTP router.
Problem: given a set of route patterns like /api/users, /api/users/:id, /api/posts, /api/posts/:id/comments, map incoming request paths to handlers.
Hash map: fails immediately. The incoming path /api/users/42 does not appear as a key. You would need to try all routes and check each one, which is O(routes) per request.
Regex: works but O(routes × |path|) per request (every regex must be tried).
Radix tree: insert each route pattern. Literal segments become edge labels. When the pattern reaches a :id segment, that position in the tree gets a special wildcard child. Lookup walks the tree character by character, and when it hits a wildcard, it consumes the segment as a parameter. O(|path|) per lookup, regardless of how many routes are registered. This is exactly how httprouter works and why it is orders of magnitude faster than regex-based routers under load.
Four routes, one shared prefix. The wildcard :id node captures whatever segment is in that position of the incoming URL, in O(|path|) with zero allocations.
When to Use the Radix Tree Data Structure
- A radix tree is a trie where every chain of single-child, non-terminal nodes collapses into a single edge with a multi-character label. One rule, enormous impact.
- The compression invariant (every internal non-terminal node has at least two children) guarantees O(n) total nodes for n stored keys.
- All operations (search, insert, delete) are O(k) where k is the key length. Characters are never re-examined. No amortized tricks: every operation is worst-case O(k).
- Insert has one non-obvious case: the edge split. Finding the divergence point is O(k). The structural change is O(1). Three pointer updates and you are done.
- Delete requires pruning childless non-terminal nodes and optionally merging newly single-child non-terminal nodes. Step 1 is mandatory. Step 2 is optional but you will regret skipping it eventually.
- Real deployments: httprouter/Gin/Echo (Go HTTP routing), nginx (IP access control), Linux page cache (since 2.5), Redis Streams (antirez/rax), network routing tables.
- Not a replacement for hash maps on exact-match workloads. The radix tree wins on prefix queries, longest-prefix-match, and structured key hierarchies.
If you want to practice recognizing when to reach for a radix tree versus a standard trie or a sorted map, SpaceComplexity runs voice-based DSA mock interviews where an AI interviewer probes your choice of data structure and asks you to justify the trade-offs out loud. The kind of question that trips people up is exactly this one: "You said trie. What's the memory cost, and how would you reduce it?"
For more on the underlying trie mechanics, see Trie Data Structure: How Prefix Trees Make Autocomplete Fast. For the hash map internals that back each node's children map, see Hash Map Time Complexity: How O(1) Really Works (and When It Doesn't). If you're working on suffix-based indexing alongside prefix indexing, Suffix Array: How 4n Bytes Beat the Suffix Tree at Its Own Game covers the complementary structure.
Further Reading
- Radix tree (Wikipedia): the canonical reference with pseudocode for all operations and a full history of PATRICIA trees.
- Trie (Wikipedia): covers the standard trie in depth; essential context for understanding what radix trees compress.
- Compressed Tries (GeeksforGeeks): worked examples with diagrams of the insert and split steps.
- The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases (Leis, Kemper, Neumann, ICDE 2013): the paper that introduced Node4/Node16/Node48/Node256 adaptive node types, solving the memory problem with the fixed 256-entry children array.
- Trees I: Radix trees (LWN.net): the original LWN article on the Linux kernel's radix tree implementation (2005), covering the page-cache use case and the locking model.
- antirez/rax on GitHub: the radix tree Redis uses for Streams, written in ANSI C with a detailed README explaining the node layout and iterator design.