Dijkstra's Algorithm: Relaxation, Priority Queues, and 14 Implementations

Your GPS finds the fastest route in under a second. Packet routing picks paths through thousands of routers in milliseconds. At the core of both is a 1956 idea that Edsger Dijkstra designed without pencil and paper in about twenty minutes, sitting in a cafe with his fiancee, and published in a three-page paper in 1959.

Dijkstra's algorithm finds the shortest path from a single source vertex to every other vertex in a weighted graph, as long as no edge weight is negative. One-line mental model: BFS that respects edge weights. Instead of visiting the node discovered earliest, it always visits the node with the smallest current known distance. One greedy choice, repeated until every reachable vertex is settled.

Reach for Dijkstra when a problem involves a weighted graph, non-negative costs, and asks for minimum cost of any kind from a fixed starting point. That covers GPS routing, network latency analysis, game pathfinding, and a wide slice of the "weighted graph" category on LeetCode.

Bae: Come over. Dijkstra: But there are so many routes. Bae: My parents aren't home. Dijkstra: opens Wikipedia

Relatable. The man invented the algorithm but still needed to run it before committing.

What Goes In

Dijkstra takes a directed or undirected graph G = (V, E) with non-negative edge weights w(u, v) >= 0, and a source vertex s. It produces an array dist[] where dist[v] is the total weight of the cheapest path from s to v. Vertices unreachable from s get dist[v] = infinity.

Here is the graph we will trace through:

      2         5
 (A) ---> (B) ---> (D)
  |         \
  4           \ 1
  v             v
 (C) ---------> (E)
          3

Source: A. Initial state: dist[A] = 0, all others = infinity.

Graph with five nodes A through E. Node A is highlighted as the source with dist=0. All other nodes labeled dist=infinity, connected by weighted directed edges.

Every node starts at infinity. One greedy loop will sort this out.

Relaxation: The Only Move That Matters

The single operation Dijkstra repeats is relaxation. For an edge from u to v with weight w:

if dist[u] + w < dist[v]:
    dist[v] = dist[u] + w

If going through u gives v a cheaper path than what we currently know, update. You start with infinity everywhere and tighten downward. Relaxation can only decrease a distance, never increase it.

Every shortest-path algorithm chooses a different order to relax edges. Bellman-Ford brute-forces it: relax every edge, V-1 times. Dijkstra's insight is that if you always relax edges from the cheapest unprocessed vertex first, each edge needs to be relaxed only once.

That's it. That's the whole idea. Twenty minutes in a cafe well spent.

Why the Greedy Bet Pays Off

Dijkstra commits to dist[u] being the final shortest distance the moment it extracts u from the min-heap. This is the greedy invariant: when a vertex u is extracted from the priority queue with distance d, d is the true shortest-path distance from the source.

The proof is by induction on extraction order. The base case is the source, extracted with distance 0. Correct by definition.

For the inductive step: suppose Dijkstra extracts u with distance d[u], but the true shortest path has cost d*[u] < d[u]. Then some path P from s to u achieves d*[u]. That path must leave the "settled" set (already extracted vertices) at some point and cross into the unsettled set. Let x be the last settled vertex on P and y be the first unsettled one.

By induction, x was correctly settled. When we settled x, we relaxed all its outgoing edges, so dist[y] <= dist[x] + w(x, y) = d*[x] + w(x, y). The rest of P from y to u has total weight >= 0 (all weights are non-negative), so d*[u] >= dist[y]. But Dijkstra extracted u before y, meaning dist[u] <= dist[y]. Chaining: d*[u] >= dist[y] >= dist[u], which contradicts d*[u] < dist[u].

The proof depends entirely on one phrase: "the rest of P has weight >= 0." Remove the non-negative weight assumption and the proof breaks at exactly that step, which is why Dijkstra fails on graphs with negative edges.

The Case Against Negative Weights

Here is a concrete failure:

A -> B: weight 2
A -> D: weight 3
D -> B: weight -5
B -> C: weight 1

True shortest distances from A: dist[B] = -2 (path A -> D -> B = 3 + (-5)), dist[C] = -1.

Dijkstra with a settled set:

Extract A (dist 0). Relax: dist[B] = 2, dist[D] = 3. Heap: [(2,B), (3,D)].
Extract B (dist 2). Mark B settled. Relax: dist[C] = 3. Heap: [(3,C), (3,D)].
Extract C (dist 3). Mark C settled. Heap: [(3,D)].
Extract D (dist 3). Try to relax D -> B: 3 + (-5) = -2 < dist[B] = 2. But B is already settled. Skip.

Result: dist[B] = 2 (wrong), dist[C] = 3 (wrong). The algorithm settled B too early because it could not know that a future path through D would be cheaper.

Graph showing A, B, C, D with the -5 edge highlighted in red. Right panel shows Dijkstra's incorrect execution: B settled at 2, the D→B relaxation blocked because B is already in the settled set.

B is settled. B does not want to be updated. B has made its decision and is not taking questions.

For negative edge weights, use Bellman-Ford (O(VE)) or SPFA. For negative cycles, shortest paths are undefined.

One exception matters: if your graph is a DAG (directed, no cycles), you can handle any edge weights by processing vertices in topological order and relaxing once per edge. That runs in O(V+E), faster than Dijkstra, and handles negative weights just fine. This is dynamic programming on a DAG. For the DP framing, see the dynamic programming framework.

How the Frontier Advances

The algorithm maintains:

dist[]: current best known distance from source to each vertex.
A min-heap: (distance, vertex) pairs, ordered by distance.
A stale-check: when popping (d, u), skip if d > dist[u].

Let's trace our example graph (A -> B:2, A -> C:4, B -> D:5, B -> E:1, C -> E:3):

Initial. dist = [A:0, B:∞, C:∞, D:∞, E:∞]. Heap: [(0, A)].

Extract (0, A). Relax A -> B: dist[B] = 2. Relax A -> C: dist[C] = 4. Heap: [(2, B), (4, C)].

Extract (2, B). Not stale (2 == dist[B]). Relax B -> D: dist[D] = 7. Relax B -> E: dist[E] = 3. Heap: [(3, E), (4, C), (7, D)].

Extract (3, E). Not stale. No outgoing edges. Heap: [(4, C), (7, D)].

Extract (4, C). Not stale. Relax C -> E: 4 + 3 = 7 > dist[E] = 3. No update. Heap: [(7, D)].

Extract (7, D). Not stale. No outgoing edges. Done.

Final: dist = [A:0, B:2, C:4, D:7, E:3].

Step-by-step table showing each extraction, the dist array values after relaxation, and the heap contents at each step. Green values indicate newly updated distances.

Notice that C's attempt to update E fails. E already knows a better route. C is snubbed.

Complexity: Why O((V+E) log V)

Implementation	Extract-Min	Relax (Decrease-Key)	Total
Simple array	O(V) per call	O(1)	O(V²)
Binary heap	O(log V) per call	O(log V)	O((V+E) log V)
Fibonacci heap	O(log V) amortized	O(1) amortized	O(E + V log V)

With a binary heap:

We call extract-min at most V times under decrease-key, or up to O(E) times under lazy deletion. Each call costs O(log(heap size)).
We perform at most E relaxations. Each relaxation either calls decrease-key or pushes a new entry. Both cost O(log V).
Total: O(V log V + E log V) = O((V+E) log V).

For connected graphs, E >= V-1, so O(E log V) is a common simplification. For dense graphs where E = O(V²), the binary heap gives O(V² log V), which is worse than the O(V²) array approach. This is why dense-graph implementations sometimes skip the heap entirely.

Fibonacci heaps achieve O(1) amortized decrease-key via lazy cascading cuts. The total becomes O(E + V log V). Theoretically better for sparse graphs where E = o(V log V). In practice, the constant factor overhead makes Fibonacci heaps slower than binary heaps on any real hardware for graphs with fewer than millions of vertices. Use them in algorithm papers, not in production.

Space complexity: the distance array takes O(V), the heap takes O(V) under decrease-key or O(E) under lazy deletion, and the adjacency list takes O(E). Total: O(V + E).

Why Everyone Uses Lazy Deletion

Decrease-key is the textbook approach. The heap holds exactly V entries, one per vertex. When you find a shorter path to v, locate v in the heap and decrease its priority. Requires index-based heap access. Exact O((V+E) log V). Harder to code, rarely implemented outside competitive programming libraries.

Lazy deletion is what everyone actually uses. Every time you find a shorter path to v, push a new (distance, v) pair onto the heap. Do not delete the old one. When you pop (d, u), check whether d > dist[u]. If so, the entry is stale. Skip it.

d, u = heapq.heappop(heap)
if d > dist[u]:
    continue  # stale: a better path was already processed

The heap can grow to O(E) entries with lazy deletion. But since E <= V², log(E) <= log(V²) = 2 log(V). The asymptotic complexity is the same, and lazy deletion is simpler, easier to debug, and competitive in practice. All fourteen implementations below use lazy deletion.

Heap contents showing a stale (5,B) entry below a live (2,B) entry. Right panel shows the if d > dist[u]: continue check that silently discards it.

The stale entry never gets processed. It just sits in the heap, aging quietly, until it is popped and immediately discarded. This is fine.

How to Recover the Actual Path

The distance array tells you the cost. To recover the path itself, track a prev[] array during relaxation:

prev = {source: None}

# During relaxation:
if dist[u] + weight < dist[v]:
    dist[v] = dist[u] + weight
    prev[v] = u          # we arrived at v through u
    heapq.heappush(heap, (dist[v], v))

# Reconstruct path to target:
path = []
node = target
while node is not None:
    path.append(node)
    node = prev.get(node)
path.reverse()
# If prev does not contain target, it is unreachable.

Graph with dashed green arrows showing prev[] pointers from each node back to the node through which it was reached. A table on the right shows prev[B]=A, prev[C]=A, prev[D]=B, prev[E]=B.

Walk prev[] backwards from the target and reverse the list. That's the path. If target is not in prev, it's unreachable. Return -1, apologize to the interviewer.

One Structure, Every Language

The graph is an adjacency list. Every implementation uses lazy deletion with a min-heap. Edge weights are non-negative integers.

heapq is a min-heap. Push (distance, node) tuples; Python compares tuples lexicographically.

import heapq

def dijkstra(
    graph: dict[int, list[tuple[int, int]]],
    source: int,
    num_nodes: int,
) -> list[float]:
    dist = [float('inf')] * num_nodes
    dist[source] = 0
    heap = [(0, source)]

    while heap:
        d, u = heapq.heappop(heap)
        if d > dist[u]:
            continue
        for v, weight in graph.get(u, []):
            if dist[u] + weight < dist[v]:
                dist[v] = dist[u] + weight
                heapq.heappush(heap, (dist[v], v))

    return dist

When to Use Dijkstra's Algorithm

The canonical use case for Dijkstra's algorithm is single-source shortest path on a weighted graph with non-negative edge weights.

Weighted grid pathfinding. Treat each grid cell as a vertex, adjacent cells as edges. The edge weight is the movement cost. Dijkstra outperforms BFS as soon as different cells have different costs (terrain types, elevation changes, traffic).

Network latency and routing. Find the minimum-latency path between two routers. Every router is a vertex, every link is a weighted edge.

Multi-source shortest paths. Add a virtual source vertex with zero-weight edges to each real source. Run Dijkstra from the virtual source. The distances you get are the minimum distance from any real source.

State-space shortest paths. When your problem tracks state beyond just "which node," create a composite state. For example, tracking fuel remaining: each state is (node, fuel), and you run Dijkstra on the state graph. The number of vertices in this expanded graph is O(V * F) where F is the number of fuel levels, but the algorithm structure is identical.

When NOT to use it:

Negative edge weights. Use Bellman-Ford.
Unweighted graph. Use BFS. Same asymptotic bounds, much simpler.
All-pairs shortest paths. Run Dijkstra V times, or use Floyd-Warshall on dense graphs.
DAG with any edge weights. Topological sort plus one relaxation pass runs in O(V+E).

How to Spot a Dijkstra Problem

The pattern signals:

Weighted graph (if unweighted, BFS is enough)
"Minimum cost/time/distance/effort to reach X"
Non-negative costs (all weights are travel times, fees, distances, probabilities: inherently positive)
Single source (one fixed starting point, or multiple sources collapsible to a virtual one)
The cost accumulates or bottlenecks along a path, and you want the optimal path

Two worked examples, from problem statement to implementation.

Example 1: Network Delay Time (LeetCode 743)

Problem. You have n nodes labeled 1 through n. A list of directed edges times[i] = (u, v, w) represents a signal traveling from u to v in time w. Send a signal from node k. Return the minimum time until all nodes have received the signal, or -1 if any node cannot be reached.

Why Dijkstra. This is textbook single-source shortest path. Run from k. The answer is max(dist[1..n]). If any dist[v] = infinity, return -1.

import heapq
from typing import List

def networkDelayTime(times: List[List[int]], n: int, k: int) -> int:
    graph = [[] for _ in range(n + 1)]
    for u, v, w in times:
        graph[u].append((v, w))

    dist = [float('inf')] * (n + 1)
    dist[k] = 0
    heap = [(0, k)]

    while heap:
        d, u = heapq.heappop(heap)
        if d > dist[u]:
            continue
        for v, weight in graph[u]:
            if dist[u] + weight < dist[v]:
                dist[v] = dist[u] + weight
                heapq.heappush(heap, (dist[v], v))

    max_dist = max(dist[1:])
    return max_dist if max_dist < float('inf') else -1

The dist[0] slot is unused because nodes are 1-indexed.

Example 2: Path With Minimum Effort (LeetCode 1631)

Problem. A 2D grid of integer heights. Move to adjacent cells (up/down/left/right). The "effort" of a path is the maximum absolute height difference between any two consecutive cells. Find the minimum effort to travel from the top-left to the bottom-right cell.

Why Dijkstra. The "effort" is a bottleneck metric: the worst single step, not the total. But Dijkstra still applies. Define dist[row][col] as the minimum achievable effort to reach that cell. The relaxation changes: instead of summing weights, you take the max.

import heapq
from typing import List

def minimumEffortPath(heights: List[List[int]]) -> int:
    rows, cols = len(heights), len(heights[0])
    dist = [[float('inf')] * cols for _ in range(rows)]
    dist[0][0] = 0
    heap = [(0, 0, 0)]  # (effort, row, col)
    directions = [(0, 1), (0, -1), (1, 0), (-1, 0)]

    while heap:
        effort, row, col = heapq.heappop(heap)
        if effort > dist[row][col]:
            continue
        if row == rows - 1 and col == cols - 1:
            return effort
        for dr, dc in directions:
            nr, nc = row + dr, col + dc
            if 0 <= nr < rows and 0 <= nc < cols:
                step = abs(heights[nr][nc] - heights[row][col])
                new_effort = max(effort, step)
                if new_effort < dist[nr][nc]:
                    dist[nr][nc] = new_effort
                    heapq.heappush(heap, (new_effort, nr, nc))

    return dist[rows - 1][cols - 1]

This works because the greedy invariant still holds: once we extract a cell with a given effort, no future path can reach that cell with less effort (since all step costs are non-negative, max only increases or stays flat as the path gets longer).

4x4 height grid with the optimal low-effort path traced in amber arrows from top-left to bottom-right. Right panel shows the modified relaxation: new = max(effort, step) instead of new = dist[u] + weight.

The lesson: Dijkstra's structure generalizes beyond sum-of-weights. Any monotone cost function where processing the cheapest state first is greedy-safe will work. Recognize this shape and you unlock a whole family of modified-Dijkstra problems.

What to Keep

Dijkstra finds single-source shortest paths on weighted graphs with non-negative edge weights.
The core move is relaxation: if dist[u] + w < dist[v]: dist[v] = dist[u] + w.
The greedy invariant: when a vertex is extracted from the min-heap, its distance is final and correct. This relies on all weights being non-negative.
Negative weights break the invariant. Use Bellman-Ford instead.
Binary heap gives O((V+E) log V) time. Simple array gives O(V²), better for dense graphs.
Lazy deletion (push duplicates, skip stale entries) is simpler than decrease-key and asymptotically equivalent.
Track prev[] during relaxation to reconstruct the actual shortest path.
The cost metric does not have to be a sum. Max, min, and other monotone functions work as long as the greedy property holds.
If the graph is a DAG, skip Dijkstra entirely and use topological sort plus relaxation in O(V+E).

If you want to practice Dijkstra under real interview conditions, SpaceComplexity runs voice-based mock interviews that walk you through graph problems with rubric feedback on your reasoning, not just your final answer.