Radix Tree Data Structure: The Compressed Trie Behind URL Routing and IP Lookup

Your standard trie works. O(k) per operation, prefix queries a hash map cannot touch, everyone goes home happy. Except for one problem: ten thousand keys averaging twenty characters each might give you two hundred thousand nodes. Most of them are single-child chains that exist only to spell out shared prefixes one character at a time. Each one holds one character, points to the next node, and does absolutely nothing else. A corridor that leads to a corridor that leads to a corridor.

The radix tree data structure (also called a compressed trie or PATRICIA trie) fixes that with one rule: collapse every chain of single-child nodes into a single edge labeled with a multi-character string. Same O(k) speed. Node count drops to O(n) regardless of key length. That is why Gin, Echo, and httprouter use radix trees for HTTP routing, why nginx uses one for IP matching, and why the Linux kernel has used one for its page cache since the 2.5 series.

Reach for a radix tree when you need prefix queries and the keys share long common prefixes, when you need longest-prefix-match (URL routing, IP subnets), or when a standard trie's memory footprint is unacceptable.

From Trie to Radix Tree: The One Compression Rule

Take the words "apple", "apply", "application", and "apt". Build a standard trie. You get a chain a-p that every key must traverse, then another p that three of the four keys traverse, then a-p-p-l that three keys traverse before branching. The trie has twenty-seven nodes. None of the interior chain nodes do any branching work.

Now apply the compression rule. Any node that has exactly one child and is not itself a stored key gets merged with that child. The edge between them becomes the concatenation of both labels. Repeat until no such node exists.

The result for those four keys:

root
└── "ap" ──► node
              ├── "pl" ──► node
              │             ├── "e"         (is_end: apple)
              │             ├── "ication"   (is_end: application)
              │             └── "y"         (is_end: apply)
              └── "t"                       (is_end: apt)

Seven nodes instead of twenty-seven. That is the entire sell. The compression invariant: every internal, non-terminal node has at least two children. A node that is both a stored key and a branching point can have one or more children. But a node that is not a stored key must branch to at least two children, or it gets merged with its only child.

Standard trie with single-char nodes chained one-per-character versus 7-node compressed radix tree for apple, apply, application, apt Left: standard trie. Single-character nodes all the way down, most of them doing nothing but pointing forward. Right: radix tree with 7 nodes and multi-character edge labels.

What a Radix Node Stores

A radix node stores exactly three things:

RadixNode:
  edge_label : string          // the label on the incoming edge from parent
  children   : Map[char → RadixNode]  // keyed by first char of child's edge
  is_end     : bool            // true if this node is a stored key

The children map is indexed by the first character of each child's edge label, giving O(1) child lookup regardless of edge label length. You take key[i], look up that character in the children map, then compare the full edge label against key[i..i+len-1].

Storing the edge label in the child node (rather than in the parent's map value) makes the node self-describing and simplifies iteration. The root node always has an empty edge label.

RadixNode struct with three fields: edge_label string, children map keyed by first char, and is_end boolean with arrows to child nodes The children map is keyed by the first character of each child's edge label. O(1) branch selection, then one linear scan of the edge to confirm the match.

Why You Never Have More Than 2n-1 Nodes

This is the proof that turns "a compressed trie" from a vibes-based description into a guarantee with a number attached.

Claim: a radix tree storing n keys has at most 2n-1 nodes total.

Proof. Partition nodes into three types: leaves L (no children; since any childless non-terminal node gets merged upward, every leaf is a stored key), internal terminal nodes I_t (is_end=true, at least one child), and internal non-terminal nodes I_nt (is_end=false, at least two children by the compression invariant). Every stored key is a leaf or an internal terminal, so I_t ≤ n − L.

Each non-root node is a child of exactly one parent. Count total children:

L + I_t + I_nt − 1  ≥  I_t + 2·I_nt   (each I_t gives ≥1 child, each I_nt gives ≥2)
L − 1  ≥  I_nt

Total nodes = L + I_t + I_nt ≤ L + (n − L) + (L − 1) = n + L − 1 ≤ 2n − 1.

A standard trie uses O(n × k) nodes in the worst case. n strings, no shared prefixes, each character its own node. A radix tree always uses O(n) nodes, independent of key length. The edge label strings cost O(total input characters), which is data you were storing anyway.

Every Operation Is O(k)

Operation	Time	Extra Space
Search	O(k)	O(1)
Insert	O(k)	O(k)
Delete	O(k)	O(1)
Prefix enumeration	O(\|prefix\| + output)	O(depth)
Build from n keys	O(n × k)	O(n × k)

k = length of the key being operated on. depth = tree height ≤ k.

Search Is O(k) Because Characters Are Never Re-examined

Start at root, depth=0. At each step: look up key[depth] in the current node's children map (O(1)). Compare the child's edge label against key[depth..depth+len-1] character by character. If it matches, advance depth by len and move to the child. If it mismatches, return false.

The total characters compared across the entire path sum to exactly k. Each character is compared at most once. You never backtrack. The lookup in each node's children map is O(1) (hash map or array). Total: O(k).

A standard trie search is also O(k), but each comparison costs exactly one character. The radix tree takes multi-character strides, touching fewer nodes and causing fewer cache misses on real key distributions. Same big-O. The constant is not the same.

Insert Is O(k) with an O(1) Edge Split

Walk the tree exactly like search until one of three cases:

Case 1: Key runs out at a node boundary. Mark that node is_end = true. Done.

Case 2: No child for the next character. Add a new leaf with the remaining key as its edge label. Done.

Case 3: The new key and an existing edge label share a common prefix of length j, then diverge. Say the edge label is "application" and you are inserting "apple". They share "appl" (j=4), then 'i' vs 'e'. You:

Create a new node split with edge label label[:j] ("appl")
Update the old child's edge label to label[j:] ("ication")
Add the old child as split.children[label[j]] (keyed by 'i')
Add a new leaf for the remaining new key key[depth+j:] ("e") as split.children[key[depth+j]]
Replace the old child in the current node with split

The structural change in case 3 is O(1) pointer operations. The O(k) cost is the traversal to find the split point.

Edge split when inserting apple into a tree that has application: shared prefix appl becomes a new split node, old edge trims to ication, new leaf gets e The structural change is O(1): create split node, update two pointers, add one new leaf. The O(k) cost is the traversal to find the divergence point.

Delete Is O(k) with Optional Recompression

Walk to the terminal node (O(k)) tracking the parent-child path. Unmark is_end. Then:

If the node has no children, remove it from its parent. Walk up: if the parent is now a childless non-terminal, remove it too. Repeat.
If after pruning, a node is a non-terminal with exactly one child, merge it with that child (concatenate edge labels). This restores the compression invariant.

Step 1 is mandatory for correctness. A childless non-terminal node is unreachable and has no reason to exist. Step 2 is optional. Skipping it is technically correct but gives you a tree that gradually loses compression after heavy deletion. Fine for write-heavy then read-only workloads. Not great otherwise.

Prefix Enumeration Is O(|prefix| + output)

Walk to the subtree root that represents the end of the prefix (same as search, but stop when the prefix is consumed). Then collect all is_end nodes in that subtree via DFS. The DFS visits exactly the nodes in the result set. Cost: O(|prefix|) to locate, O(output) to collect.

One Structure, Every Language

The node stores edge_label (incoming edge string), children (map keyed by first character), and is_end. All implementations below provide insert, search, and prefix enumeration. The Python 3 version also shows delete in full.

class RadixNode:
    __slots__ = ("edge_label", "children", "is_end")

    def __init__(self, edge_label: str = "") -> None:
        self.edge_label = edge_label
        self.children: dict[str, "RadixNode"] = {}
        self.is_end = False


class RadixTree:
    def __init__(self) -> None:
        self.root = RadixNode()

    def insert(self, key: str) -> None:
        node = self.root
        depth = 0
        while depth < len(key):
            ch = key[depth]
            if ch not in node.children:
                leaf = RadixNode(key[depth:])
                leaf.is_end = True
                node.children[ch] = leaf
                return
            child = node.children[ch]
            label = child.edge_label
            j = 0
            while j < len(label) and depth + j < len(key) and label[j] == key[depth + j]:
                j += 1
            if j == len(label):
                depth += j
                node = child
                continue
            # Split edge at position j
            split = RadixNode(label[:j])
            child.edge_label = label[j:]
            split.children[label[j]] = child
            node.children[ch] = split
            if depth + j < len(key):
                leaf = RadixNode(key[depth + j:])
                leaf.is_end = True
                split.children[key[depth + j]] = leaf
            else:
                split.is_end = True
            return
        node.is_end = True

    def search(self, key: str) -> bool:
        node = self.root
        depth = 0
        while depth < len(key):
            ch = key[depth]
            if ch not in node.children:
                return False
            child = node.children[ch]
            label = child.edge_label
            if key[depth : depth + len(label)] != label:
                return False
            depth += len(label)
            node = child
        return node.is_end

    def starts_with(self, prefix: str) -> list[str]:
        node = self.root
        depth = 0
        while depth < len(prefix):
            ch = prefix[depth]
            if ch not in node.children:
                return []
            child = node.children[ch]
            label = child.edge_label
            j = 0
            while j < len(label) and depth + j < len(prefix) and label[j] == prefix[depth + j]:
                j += 1
            if depth + j == len(prefix):
                results: list[str] = []
                self._collect(child, prefix[:depth] + label, results)
                return results
            if j < len(label):
                return []
            depth += len(label)
            node = child
        results = []
        self._collect(node, prefix, results)
        return results

    def _collect(self, node: RadixNode, acc: str, results: list[str]) -> None:
        if node.is_end:
            results.append(acc)
        for child in node.children.values():
            self._collect(child, acc + child.edge_label, results)

    def delete(self, key: str) -> bool:
        path: list[tuple[RadixNode, str]] = []
        node = self.root
        depth = 0
        while depth < len(key):
            ch = key[depth]
            if ch not in node.children:
                return False
            child = node.children[ch]
            label = child.edge_label
            if key[depth : depth + len(label)] != label:
                return False
            path.append((node, ch))
            depth += len(label)
            node = child
        if not node.is_end:
            return False
        node.is_end = False
        # Prune childless non-terminal nodes upward
        while path and not node.children and not node.is_end:
            parent, ch = path.pop()
            del parent.children[ch]
            node = parent
        # Merge node with its only child if it is now non-terminal and single-child
        if path and not node.is_end and len(node.children) == 1:
            (only_ch, only_child), = node.children.items()
            node.edge_label += only_child.edge_label
            node.children = only_child.children
            node.is_end = only_child.is_end
        return True

Where Radix Trees Ship in Production

URL routing with path parameters. An HTTP router must map /users/42/posts/7 to a handler function. A hash map fails immediately: the actual URL was never inserted, only the pattern /users/:id/posts/:postId. A radix tree walks the path segments, recognizing literal characters ('/', 'u', 's', ...) until it hits a node whose child starts with ':' or '*', then switches into parameter-capture mode. Gin and Echo register hundreds of routes and resolve any incoming URL in O(|path|) with zero allocations.

IP longest-prefix-match routing. Network routers must find the most specific matching subnet for each packet's destination address. For IP, that means the longest prefix of the destination that appears in the routing table. A radix tree (at the bit level: each node branches on one bit of the IP address) walks the address from most-significant to least-significant bit, tracking the last match seen. When the walk ends, the last match is the longest prefix. Nginx uses this for IP access control; hardware routers use ternary CAM (essentially hardware radix trees) for line-rate routing.

Autocomplete and dictionary prefix search. Any system that suggests completions for a partial word uses some form of prefix index. A hash map cannot answer "give me all words starting with 'appl'". You would have to scan everything. A sorted array can, with O(log n + output) binary search. A radix tree answers in O(|prefix| + output), and for large key sets with shared prefixes (a dictionary, a domain name catalog, a code identifier index), uses far less memory than a standard trie.

DNS resolution. Domain names are traversed in reverse (com → example → www) for zone matching. Each label is a segment. Radix trees over domain name labels let resolvers find the deepest authoritative zone for any query in O(|name|).

Key-value store namespacing. Redis Streams uses a radix tree (antirez wrote rax himself) to store consumer group state keyed by stream IDs. Package managers use them to find all packages in a namespace (e.g., all packages starting with @babel/).

Five Signals to Reach for a Radix Tree

The unifying pattern: you need to match partial keys against a stored set, not whole keys. Hash maps and exact-match indexes don't help here.

The problem says "find all keys with prefix X" and a hash set is not enough.
You need longest-prefix-match: given a query, find the stored key that is the longest prefix of it.
The keys share substantial common prefixes and a standard trie's memory is a concern.
You need ordered enumeration within a namespace (all keys from "aa" to "az" in alphabetical order).
You need wildcard or parameterized matching along a hierarchical path.

Worked example 1: autocomplete system.

Problem: given a dictionary of n words, answer autocomplete queries. Each query gives a prefix p and wants all words starting with p.

Brute force: scan all n words and check word.startswith(p). O(n × |p|) per query.

Hash set: no help. You cannot enumerate all strings with a given prefix from a hash set without scanning everything.

Sorted array + binary search: binary search for the first word ≥ p and the first word ≥ p+1 (the next prefix), then collect. O(log n + output). Reasonable, but building the prefix boundary requires careful comparison logic and the structure does not support insertions efficiently.

Radix tree: walk p through the tree in O(|p|). If no node for p exists, return empty. If a node exists, DFS from there to collect all is_end nodes. O(|p| + output). Insertions and deletions are O(|word|) without rebalancing. The structure is correct by construction: the tree encodes exactly the set of stored words.

Worked example 2: HTTP router.

Problem: given a set of route patterns like /api/users, /api/users/:id, /api/posts, /api/posts/:id/comments, map incoming request paths to handlers.

Hash map: fails immediately. The incoming path /api/users/42 does not appear as a key. You would need to try all routes and check each one, which is O(routes) per request.

Regex: works but O(routes × |path|) per request (every regex must be tried).

Radix tree: insert each route pattern. Literal segments become edge labels. When the pattern reaches a :id segment, that position in the tree gets a special wildcard child. Lookup walks the tree character by character, and when it hits a wildcard, it consumes the segment as a parameter. O(|path|) per lookup, regardless of how many routes are registered. This is exactly how httprouter works and why it is orders of magnitude faster than regex-based routers under load.

HTTP router radix tree: four routes share /api/ prefix, branch to users and posts, each with wildcard parameter children for :id Four routes, one shared prefix. The wildcard :id node captures whatever segment is in that position of the incoming URL, in O(|path|) with zero allocations.

When to Use the Radix Tree Data Structure

A radix tree is a trie where every chain of single-child, non-terminal nodes collapses into a single edge with a multi-character label. One rule, enormous impact.
The compression invariant (every internal non-terminal node has at least two children) guarantees O(n) total nodes for n stored keys.
All operations (search, insert, delete) are O(k) where k is the key length. Characters are never re-examined. No amortized tricks: every operation is worst-case O(k).
Insert has one non-obvious case: the edge split. Finding the divergence point is O(k). The structural change is O(1). Three pointer updates and you are done.
Delete requires pruning childless non-terminal nodes and optionally merging newly single-child non-terminal nodes. Step 1 is mandatory. Step 2 is optional but you will regret skipping it eventually.
Real deployments: httprouter/Gin/Echo (Go HTTP routing), nginx (IP access control), Linux page cache (since 2.5), Redis Streams (antirez/rax), network routing tables.
Not a replacement for hash maps on exact-match workloads. The radix tree wins on prefix queries, longest-prefix-match, and structured key hierarchies.

If you want to practice recognizing when to reach for a radix tree versus a standard trie or a sorted map, SpaceComplexity runs voice-based DSA mock interviews where an AI interviewer probes your choice of data structure and asks you to justify the trade-offs out loud. The kind of question that trips people up is exactly this one: "You said trie. What's the memory cost, and how would you reduce it?"

For more on the underlying trie mechanics, see Trie Data Structure: How Prefix Trees Make Autocomplete Fast. For the hash map internals that back each node's children map, see Hash Map Time Complexity: How O(1) Really Works (and When It Doesn't). If you're working on suffix-based indexing alongside prefix indexing, Suffix Array: How 4n Bytes Beat the Suffix Tree at Its Own Game covers the complementary structure.