Skip List Data Structure: Randomness Gets You O(log n) Without a Single Rotation

Balanced binary search trees (AVL, red-black) guarantee O(log n) search, insert, and delete. They earn that guarantee through rotations, color flips, and rebalancing logic that nobody writes from memory on a whiteboard. Seriously, close your eyes and implement a red-black tree insert. Go ahead. I'll wait.

Thanos approving a balanced AVL tree while rejecting an unbalanced BST. Skip lists just flip a coin instead. Skip lists watching you struggle with left-right rotations

In 1990, William Pugh published a paper in Communications of the ACM with a radical alternative: what if you could get the same expected O(log n) performance by flipping coins instead of balancing trees? No rotations. No color flips. Just vibes and probability theory.

That alternative is the skip list, a probabilistic data structure built from layered linked lists. The mental model is a subway system. The bottom level is the local train that stops at every station. Each level above is an express line that skips more and more stops. To find your station, you ride the fastest express until you overshoot, drop to a slower line, and repeat until the local delivers you to the exact stop. If you have ever transferred from an express to a local on the NYC subway, congratulations: you already understand skip list search.

You reach for a skip list when you need a sorted collection with fast search, insert, and delete, and you want the implementation to be simple enough to get right under pressure. Redis uses one to back sorted sets. Java ships ConcurrentSkipListMap in the standard library. LevelDB and RocksDB use skip lists as their in-memory write buffer. The reason is always the same: a skip list is dramatically simpler to implement than a balanced BST, especially when concurrency enters the picture.

How a skip list data structure works internally

A skip list is a hierarchy of sorted linked lists stacked on top of each other. Every element lives in the bottom list (level 0). Some elements are "promoted" to level 1, fewer to level 2, and so on. The promotion is random, decided independently for each element by a coin flip. It is the data structure equivalent of getting promoted because you won a raffle, not because you filled out the right paperwork.

Every element lives at the bottom, and some ride the express

Picture a sorted linked list of the integers 3, 7, 12, 19, 22, 26, 31. That is level 0, the local train. Now copy some of those nodes up to level 1. Maybe 7, 19, and 26 get promoted. Copy a subset of those to level 2. Maybe just 19, because 19 is apparently very lucky with coin flips.

Skip list with three levels showing nodes 3, 7, 12, 19, 22, 26, 31 at level 0, nodes 7, 19, 26 at level 1, and node 19 at level 2 The subway map. Level 0 is the local. Level 2 is the express that stops only at 19.

Each node contains a key (or key-value pair) and an array of forward pointers, one per level the node participates in. A node at level 2 has three forward pointers: one for level 0, one for level 1, and one for level 2. A node at level 0 has just one. Think of forward pointers like arms reaching out to shake hands with the next node on each line.

Memory layout of skip list node 19 showing forward pointer array with forward[0] pointing to node 22, forward[1] to node 26, and forward[2] to NIL Node 19 in memory: three forward pointers because it won three coin flips.

The coin flip that builds the layers

When you insert a new element, you need to decide how tall its tower of pointers should be. You do not plan. You do not measure. You flip coins.

randomLevel():
    level = 0
    while random() < p and level < MAX_LEVEL:
        level = level + 1
    return level

With p = 0.5, half of all nodes reach level 1, a quarter reach level 2, an eighth reach level 3, and so on. The expected height of any node is 1/(1-p) levels. For p = 0.5, that is 2 levels on average. For p = 0.25 (what Redis uses), it is roughly 1.33. Most nodes are short. A few are tall. It is the data structure version of a city skyline.

Geometric probability distribution showing level 0 has all n nodes, level 1 has n/2, level 2 has n/4, level 3 has n/8, forming a pyramid shape The pyramid of promotion. Each level keeps half the crowd. By level 4, you are basically VIP.

The choice of p is a space-time knob. A higher p gives more express lanes (slightly faster search) but costs more pointers per node. A lower p saves space at the cost of more comparisons. In practice, p = 0.5 and p = 0.25 are the two common choices, and the search time difference between them is small.

MAX_LEVEL is set to log_{1/p}(N) where N is the largest number of elements you expect. For p = 0.5 and a million elements, MAX_LEVEL = 20 is plenty. Redis uses MAX_LEVEL = 32 with p = 0.25, which covers up to 4^32 elements (far more than fits in memory).

The sentinel that anchors everything

The skip list has a special header node (sometimes called the sentinel) that does not hold a real key. It has forward pointers at every level up to MAX_LEVEL. Every search starts here. Think of it as the grand central station of the subway map, connecting all the express lines. It is always there, even when the rest of the list is empty. Very loyal.

The skip list also tracks its current highest occupied level. Searches start from this level, not from MAX_LEVEL, so empty upper levels cost nothing.

Skip list time complexity: what every operation costs

Operation	Average Time	Worst-Case Time	Space
Search	O(log n)	O(n)	O(1)
Insert	O(log n)	O(n)	O(log n) expected for new node
Delete	O(log n)	O(n)	O(1)
Space (total structure)	O(n)	O(n log n)

The worst case, O(n), happens when every node lands at level 0. That is the scenario where you flip tails on every single coin for every single node. The probability of that is (1-p)^n, which shrinks so fast it makes lottery odds look generous. With high probability (at least 1 - 1/n), the height stays below 2 log_{1/p}(n) and all operations run in O(log n).

Search rides the express lanes down

Search for key k starts at the header's highest active level. At each level, you walk forward as long as the next node's key is less than k. When the next node is greater or equal (or nil), you drop down one level and repeat. When you reach level 0, you check if the node directly ahead holds k. It is the algorithmic version of "I went too far, let me take the side streets."

search(list, key):
    current = list.header
    for i from list.level down to 0:
        while current.forward[i] != NIL and current.forward[i].key < key:
            current = current.forward[i]
    current = current.forward[0]
    return current != NIL and current.key == key

Here is the search path for finding 22 in our example. Watch how we ride express level 2 to 19, realize we overshot, drop down, and zero in:

Three levels, four hops, done. Binary search wishes it had this much style.

The search touches at most O(1/p) nodes per level and traverses O(log n) levels, giving expected O(log n) total comparisons. The formal proof follows below.

Insert: search, flip, splice

Insertion does three things in sequence. Search, gamble, wire up.

First, search for the insertion point exactly as above, but record an update[] array along the way. update[i] is the last node you visited at level i before dropping down. These are the nodes whose forward pointers need to change.

Second, generate a random level for the new node. If this level exceeds the current list level, extend the update array so the extra levels point to the header.

Third, splice the new node into every level from 0 up to its generated level. For each level i, set the new node's forward[i] to update[i].forward[i], then set update[i].forward[i] to the new node.

insert(list, key):
    update = array of size MAX_LEVEL + 1
    current = list.header
    for i from list.level down to 0:
        while current.forward[i] != NIL and current.forward[i].key < key:
            current = current.forward[i]
        update[i] = current
    current = current.forward[0]
    if current != NIL and current.key == key:
        return                          // duplicate; skip or update value
    newLevel = randomLevel()
    if newLevel > list.level:
        for i from list.level + 1 to newLevel:
            update[i] = list.header
        list.level = newLevel
    newNode = createNode(key, newLevel)
    for i from 0 to newLevel:
        newNode.forward[i] = update[i].forward[i]
        update[i].forward[i] = newNode

The splice step for inserting 15 at level 1 looks like this:

Before and after inserting node 15 into a skip list at level 1, showing pointer rewiring from nodes 7 and 12 Node 15 slides in. The old connections get redirected. Nobody else even notices.

The cost of insertion is dominated by the search phase: O(log n) expected. The splice itself touches at most newLevel + 1 pointers, which is O(log n) in the worst case but O(1) in expectation (since the expected level is 1/(1-p)).

Delete: search, unlink, trim

Deletion mirrors insertion. Search for the target key while building the update[] array. If the target exists, unlink it at every level. Then trim the list's recorded level if the top levels are now empty. It is insertion played in reverse, like watching someone un-eat spaghetti.

delete(list, key):
    update = array of size MAX_LEVEL + 1
    current = list.header
    for i from list.level down to 0:
        while current.forward[i] != NIL and current.forward[i].key < key:
            current = current.forward[i]
        update[i] = current
    current = current.forward[0]
    if current == NIL or current.key != key:
        return false
    for i from 0 to list.level:
        if update[i].forward[i] != current:
            break
        update[i].forward[i] = current.forward[i]
    free(current)
    while list.level > 0 and list.header.forward[list.level] == NIL:
        list.level = list.level - 1
    return true

Deletion costs O(log n) expected time (dominated by the search) and adjusts at most O(level) pointers during the unlink.

Why O(log n) holds: the backward analysis

This is the part where we prove that the coin-flip chaos actually works. The argument is due to Pugh and it is genuinely clever.

Instead of analyzing the forward search path (header to target), walk the path in reverse: start at the found node and trace back to the header. At each step along this backward walk, you either go up one level or go left one position. Why reverse? Because going backward, each node's promotion decision is independent and memoryless, which makes the math clean.

The key insight: when you arrive at a node during the backward walk, you can treat the node's level as if it were being decided right now. If the node was promoted past the current level (probability p), you go up. If it was not (probability 1-p), you go left.

Backward analysis of skip list search showing the reverse path from found node to header, with UP (probability p) and LEFT (probability 1-p) decisions at each step Walking backwards through the search. At each node: did it get promoted (go up) or not (go left)?

Let C(k) be the expected number of nodes examined while climbing k levels. The recurrence is:

C(k) = p · (1 + C(k-1)) + (1-p) · (1 + C(k))

The first term: with probability p, you go up (cost 1 step) and now need to climb k-1 more levels. The second term: with probability 1-p, you go left (cost 1 step) and still need to climb k levels.

Solve it:

C(k) = 1 + p·C(k-1) + (1-p)·C(k)
C(k) - (1-p)·C(k) = 1 + p·C(k-1)
p·C(k) = 1 + p·C(k-1)
C(k) = 1/p + C(k-1)

With C(0) = 0, this telescopes to C(k) = k/p. Four lines of algebra. That is the whole proof. If your algorithms professor made this take 45 minutes, they owe you an apology.

The expected height of the skip list is log_{1/p}(n). At the topmost occupied level, there is an additional expected cost of 1/(1-p) to traverse left to the header. So the total expected comparisons are:

Total = C(log_{1/p}(n)) + 1/(1-p)
      = log_{1/p}(n) / p + 1/(1-p)

For p = 0.5: about 2·log₂(n) + 2 comparisons. For p = 0.25: 4·log₄(n) + 4/3 = 2·log₂(n) + 4/3 comparisons (since log₄(n) = log₂(n)/2).

Both choices give roughly 2·log₂(n) comparisons. The search cost is O(log n) regardless of p, because p only affects the constant factor. You pick p, and the math says "sure, whatever, still logarithmic."

Why the height is O(log n) with high probability

"But what if you get unlucky?" Good question. The probabilistic argument for the height bound uses a union bound, and it shows that getting that unlucky is absurdly unlikely. A single node reaches level l with probability p^l. The probability that any of the n nodes reaches level l is at most n · p^l (by the union bound).

Set l = c · log_{1/p}(n):

P(height ≥ l) ≤ n · p^(c · log_{1/p}(n))
              = n · n^(-c)
              = n^(1-c)

For c = 2, this is 1/n. So with probability at least 1 - 1/n, the skip list's height is at most 2·log_{1/p}(n). The O(n) worst case is astronomically unlikely. You are more likely to get struck by lightning while winning the lottery while a meteor lands on your house.

Space costs O(n), and you can prove it on a napkin

Each node at level 0 gets promoted to level 1 with probability p, to level 2 with probability p², and so on. The expected number of forward pointers per node is:

1 + p + p² + p³ + ... = 1/(1-p)

For n nodes, the expected total pointers are n/(1-p).

With p = 0.5: 2n pointers. Each node averages 2 forward pointers.
With p = 0.25: (4/3)n ≈ 1.33n pointers. This is why Redis chose p = 0.25.

The expected space is O(n) with a constant factor controlled by p. Compared to a balanced BST with exactly 2 child pointers per node (total 2n), a skip list with p = 0.25 actually uses fewer pointers on average. The lazy probabilistic structure costs less memory than the uptight deterministic one. Let that sink in.

One Structure, Every Language

Every skip list implementation below includes the four essential operations: randomLevel, search, insert, and delete. The node holds an integer key and an array of forward pointers. The sentinel header anchors the top-left corner. The code is the same algorithm in every language. If you can read one, you can read all of them. If you cannot read any of them, start with Python and work your way down.

import random

class Node:
    def __init__(self, key, level):
        self.key = key
        self.forward = [None] * (level + 1)

class SkipList:
    MAX_LEVEL = 16
    P = 0.5

    def __init__(self):
        self.level = 0
        self.header = Node(float('-inf'), self.MAX_LEVEL)

    def random_level(self):
        lvl = 0
        while random.random() < self.P and lvl < self.MAX_LEVEL:
            lvl += 1
        return lvl

    def search(self, key):
        current = self.header
        for i in range(self.level, -1, -1):
            while current.forward[i] and current.forward[i].key < key:
                current = current.forward[i]
        current = current.forward[0]
        return current is not None and current.key == key

    def insert(self, key):
        update = [None] * (self.MAX_LEVEL + 1)
        current = self.header
        for i in range(self.level, -1, -1):
            while current.forward[i] and current.forward[i].key < key:
                current = current.forward[i]
            update[i] = current
        current = current.forward[0]
        if current and current.key == key:
            return
        new_level = self.random_level()
        if new_level > self.level:
            for i in range(self.level + 1, new_level + 1):
                update[i] = self.header
            self.level = new_level
        new_node = Node(key, new_level)
        for i in range(new_level + 1):
            new_node.forward[i] = update[i].forward[i]
            update[i].forward[i] = new_node

    def delete(self, key):
        update = [None] * (self.MAX_LEVEL + 1)
        current = self.header
        for i in range(self.level, -1, -1):
            while current.forward[i] and current.forward[i].key < key:
                current = current.forward[i]
            update[i] = current
        current = current.forward[0]
        if not current or current.key != key:
            return False
        for i in range(self.level + 1):
            if update[i].forward[i] != current:
                break
            update[i].forward[i] = current.forward[i]
        while self.level > 0 and self.header.forward[self.level] is None:
            self.level -= 1
        return True

What problems does a skip list solve?

Now that you know how they work, the important question: when do you actually use one? Skip lists are the right tool for a specific family of problems.

Sorted collections with efficient mutation. When you need insert, delete, and search all in O(log n), and the data must stay sorted, a skip list competes directly with balanced BSTs. The implementation is far simpler, which matters when you are writing it by hand (in an interview, in an embedded system, or in a language without a standard library tree).

Range queries over sorted data. Once you find the start of a range in O(log n), you walk the level-0 linked list to enumerate the range in O(k) where k is the number of results. This is exactly how Redis ZRANGEBYSCORE works.

Concurrent sorted maps. Concurrency is the skip list's strongest argument over balanced trees. A BST rotation can touch the root, meaning any concurrent insert might need to lock the entire tree. A skip list insert only modifies O(log n) pointers, and these pointers are spread across levels that can be updated independently. Lock-free implementations become practical. Java's ConcurrentSkipListMap exists for exactly this reason.

Write-heavy in-memory indexes. LevelDB, RocksDB, and other LSM-tree databases use skip lists for their memtable (the in-memory write buffer). Writes land in the skip list in O(log n), and when the memtable is full, a level-0 walk produces a sorted stream for flushing to disk.

Priority queues with arbitrary deletion. A binary heap gives O(log n) insert and extract-min, but deleting an arbitrary element costs O(n) because you have to find it first. A skip list gives O(log n) for all three operations. The heap has a search problem, and the skip list does not.

How to spot a skip list problem in an interview

Signal 1: You need ordered operations, not just point lookups. If you only need insert and lookup by key, a hash map is simpler and faster. The moment you need "find the smallest key greater than X" or "give me all keys between A and B," a hash map is useless and a skip list (or balanced BST) becomes necessary.

Signal 2: Simplicity of implementation matters. If you are in an interview, writing from scratch in an unfamiliar language, or working in a codebase that cannot pull in a balanced tree library, the skip list wins on simplicity. The insert is essentially "do a search, flip a coin for the level, splice in." No rotations, no color flips, no case analysis. You could explain the entire algorithm to someone at a bar, and they might actually follow along (assuming they are a nerd at a bar, which, if you are reading this article, they probably are).

Signal 3: Concurrency is a factor. If multiple threads or goroutines need to read and write the sorted collection simultaneously, a skip list is far easier to make lock-free than a balanced BST.

Worked example: streaming leaderboard with range queries

Problem: You are building a real-time game leaderboard. Players' scores update constantly. The system must support: (1) update a player's score, (2) look up a player's rank, (3) return the top K players, (4) return all players with scores between A and B. All operations should be fast.

Why a skip list fits: A hash map handles player-to-score lookups in O(1) but cannot answer range or rank queries. A sorted array supports binary search but makes insertions O(n). A balanced BST works but is complex to augment for rank queries.

A skip list handles all four requirements naturally. Store (score, playerID) pairs in sorted order. Updates are O(log n) delete-then-insert. Range queries walk level 0 after an O(log n) search to find the start. For rank queries, you can augment each forward pointer with a "span" (the number of level-0 nodes it skips over), exactly as Redis does for its ZRANK command. This gives O(log n) rank lookups without any tree rotation bookkeeping.

Worked example: in-memory write buffer for a key-value store

Problem: You are building a simple key-value store. Writes must be fast. Reads must return the latest value. Periodically, the in-memory data gets flushed to disk as a sorted file. You need the flush to produce keys in sorted order.

Why a skip list fits: A hash table gives O(1) writes but produces unsorted output, requiring an O(n log n) sort at flush time. A balanced BST works but is harder to implement correctly, and concurrent writes require careful locking.

A skip list gives O(log n) writes, and the flush is just a level-0 walk that produces sorted output in O(n). If you need concurrent writes (and you do, because writes arrive from multiple client threads), the skip list's structure makes a lock-free or fine-grained-locking implementation practical. This is precisely why LevelDB chose a skip list for its memtable.

The quick version (for people who scroll to the bottom)

A skip list is a hierarchy of sorted linked lists where each node is randomly promoted to higher levels with probability p.
Search, insert, and delete all run in O(log n) expected time. The worst case is O(n) but exponentially unlikely.
The backward analysis proves search is O(log n): climbing k levels costs k/p expected comparisons, and the expected height is log_{1/p}(n).
Expected space is O(n) with n/(1-p) total pointers. p = 0.5 gives 2n pointers; p = 0.25 gives 1.33n.
Skip lists are simpler to implement than AVL or red-black trees. No rotations, no rebalancing, no case analysis.
They are far easier to make lock-free than balanced BSTs.
Real-world usage: Redis sorted sets, Java ConcurrentSkipListMap, LevelDB/RocksDB memtables.
Reach for a skip list when you need sorted operations, range queries, or concurrent access, and simplicity matters. Or when you just really do not want to implement a red-black tree.

If you want to practice explaining skip list operations out loud (the way you'd need to in an interview), SpaceComplexity runs voice-based mock interviews where you walk through your data structure choices in real time and get rubric-based feedback on how clearly you communicated the tradeoffs.

If you're comparing this structure to other options, our deep dives on balanced BSTs and why they guarantee O(log n), hash maps and when O(1) breaks down, and the binary search tree that underpins all of them cover the alternatives in the same depth.