Prim's Algorithm: Grow the Cheapest Tree, One Edge at a Time

May 26, 202628 min read
dsaalgorithmsinterview-prepdata-structures
Prim's Algorithm: Grow the Cheapest Tree, One Edge at a Time
TL;DR
  • Prim's algorithm grows one minimum spanning tree from a start vertex by always picking the cheapest edge crossing the tree/non-tree boundary.
  • Correctness follows from the cut property: the minimum-weight crossing edge for any cut belongs to some MST, proved by exchange argument.
  • Prim's algorithm time complexity is O(V²) with a simple array, O(E log V) with a binary heap, and O(E + V log V) with a Fibonacci heap.
  • Lazy deletion (push stale entries, skip visited vertices on pop) is the practical implementation since most standard heaps don't support decrease-key.
  • Prim's works with negative weights but fails on disconnected graphs; detect failure by checking that edges used equals V.
  • Choose Prim's over Kruskal's for dense graphs; for sparse graphs both are O(E log V) and either is fine.
  • MST signal keywords: "connect all nodes with minimum cost," "minimize the maximum edge weight," or "exactly N-1 connections."

You have a graph. Every edge has a weight. You want to connect every vertex with exactly V-1 edges, at minimum total weight. No leaving anyone out. That's a minimum spanning tree, and Prim's algorithm is the cleanest way to build one.

The mental model: start anywhere. Draw a circle around the single vertex you've chosen. That circle is your "tree side." Everything outside is the "non-tree side." Repeatedly find the cheapest edge that crosses the boundary, add it, expand the circle, repeat. When the circle swallows every vertex, you're done. The edges you added form the MST.

Vojtěch Jarník worked this out in 1930. Robert Prim rediscovered it independently in 1957. Edsger Dijkstra found it again in 1959. Three separate people converged on the same algorithm across three decades. That's either mathematical inevitability or a reflection of how limited 1930s entertainment options were. It's sometimes called the DJP algorithm or the Jarník algorithm, and the core idea is old enough that it should feel obvious in hindsight. That's usually the mark of a good algorithm.


The Cut, the Frontier, and the Growing Tree

A cut in a graph is a partition of its vertices into two disjoint sets S and V-S. A crossing edge connects a vertex in S to a vertex in V-S.

Prim's algorithm maintains a cut at every step: S is the set of vertices already in the tree, V-S is the rest. The algorithm repeatedly picks the minimum-weight crossing edge and adds it to the tree. Because we always pick the global minimum crossing edge, we grow one connected component from start to finish. There are never two separate pieces to reconcile.

Initial state: 5-vertex graph
Vertices: 0, 1, 2, 3, 4
Edges: 0-1(2), 0-3(6), 1-2(3), 1-3(8), 1-4(5), 2-4(7), 3-4(9)

Step 1: tree = {0}   frontier edges = [0-1(2), 0-3(6)]
        pick 0-1(2)

Step 2: tree = {0,1} frontier edges = [1-2(3), 1-4(5), 0-3(6), 1-3(8)]
        pick 1-2(3)

Step 3: tree = {0,1,2} frontier edges = [1-4(5), 0-3(6), 2-4(7), 1-3(8)]
        pick 1-4(5)

Step 4: tree = {0,1,2,4} frontier edges = [0-3(6), 1-3(8), 3-4(9)]
        pick 0-3(6)

Step 5: tree = {0,1,2,3,4}  (all vertices included)
        MST total = 2 + 3 + 5 + 6 = 16

At each step, the minimum crossing edge points to a vertex not yet in the tree. That vertex joins. Its edges become candidates. Any edge that now connects two tree vertices is discarded because it would create a cycle.

Prim's cut property: blue vertices in the tree (S), amber vertices outside (V-S), green edge is the safe minimum crossing edge

The cut at any step: S (blue) is the tree side, V-S (amber) is the rest. The green edge is the minimum crossing edge, and the cut property guarantees it belongs to some MST.

Prim's algorithm growing the MST across five steps, with priority queue state shown below

Each panel shows the tree after adding one edge. By step 5, all vertices are blue and the MST is complete.


Why Greedy Works: The Cut Property

Greedy algorithms have a bad reputation. A lot of them look right, work on small examples, and then quietly fail on the fifth test case. Prim's is not one of those. It has a genuine proof.

Programmers when they encounter a greedy algorithm that actually works

Same energy every time.

The cut property: for any cut (S, V-S) of a connected graph, the minimum-weight edge crossing the cut belongs to some MST.

Proof by contradiction. The fun kind, where you assume the wrong thing and watch it unravel.

Let e be the minimum crossing edge for some cut (S, V-S). Suppose every MST avoids e. Take any MST T*. Add e to T*: this creates exactly one cycle C, because T* was a spanning tree. That cycle must have at least one other edge f that also crosses the cut (the path from e's endpoint in S to e's endpoint in V-S must re-cross the boundary somewhere). Since e is the minimum crossing edge, w(e) ≤ w(f). Now swap: let T' = T* - {f} + {e}. T' is still connected and has V-1 edges, so it's a spanning tree. Its weight is w(T*) - w(f) + w(e) ≤ w(T*). So T' is also an MST, and T' contains e. Contradiction: no MST avoids e.

The induction for Prim's correctness follows from this directly. At step k, the k vertices in S form a valid partial MST by inductive hypothesis. The minimum edge crossing the cut (S, V-S) is safe to add by the cut property. After V-1 steps, all vertices are in S and every edge added was safe. QED.

One subtlety: the cut property requires the minimum crossing edge to be unique for the MST to be unique. If multiple edges share the same minimum weight, the cut property still holds (the minimum-weight edge is in some MST), but Prim's might produce different MSTs depending on tie-breaking. All of them are valid.


When the Proof Breaks (and Why It Doesn't)

Prim's algorithm works correctly with negative edge weights. This surprises people who conflate it with Dijkstra's, which famously fails on negative weights. If someone tells you Prim's breaks on negatives, they're confusing the two algorithms. Common mistake. The cut property doesn't care about sign. Picking the minimum crossing edge is safe whether weights are positive, zero, or negative.

What does break Prim's: a disconnected graph. If the graph has no path between two vertices, the algorithm will exhaust the reachable component and stop with fewer than V vertices in the tree. The right approach is to detect this (MST size is less than V-1) and either run Prim's from a new starting vertex in each component, producing a spanning forest, or conclude no spanning tree exists.


Four Implementations, Four Complexities

The same algorithm, four implementations, four different complexities. The difference is entirely in how you find the minimum crossing edge.

ImplementationTimeSpaceBest for
Adjacency matrix, linear scanO(V²)O(V)Dense graphs (E ≈ V²)
Binary heap, lazy deletionO(E log E)O(E)Sparse graphs, simple code
Binary heap, eager (indexed PQ)O(E log V)O(V)Sparse graphs, lower space
Fibonacci heapO(E + V log V)O(V)Asymptotically optimal

Visual comparison of the four Prim's implementations with time complexity, space complexity, and use case

The lazy binary heap is the practical default. The Fibonacci heap is optimal on paper and complicated enough to earn its own footnote in most textbooks.

Why O(V²) for the matrix version: you run V iterations. Each iteration scans all V vertices to find the one with the minimum key value: O(V). Then you scan all V neighbors to update keys: O(V). Total: V * O(V) = O(V²). No heap, no complex data structure. For dense graphs where E approaches V², this beats the heap-based approach since O(E log V) = O(V² log V) is worse.

Why O(E log V) for the heap version: you extract the minimum vertex V times: V * O(log V) = O(V log V). For each extracted vertex, you check its edges and potentially decrease the key of its neighbors. Across all vertices, the total edges examined is E, giving E decrease-key operations. Each decrease-key on a binary heap is O(log V). Total: O(V log V + E log V) = O(E log V) for connected graphs (where E ≥ V-1). Note that O(E log E) and O(E log V) are equivalent because E ≤ V² means log E ≤ 2 log V.

Why O(E + V log V) for a Fibonacci heap: decrease-key is O(1) amortized on a Fibonacci heap (lazy cascading cuts defer the restructuring work). Only extract-min costs O(log V) amortized. So: V extract-min operations at O(log V) each gives O(V log V), plus E decrease-key operations at O(1) each gives O(E). Total: O(E + V log V). Fibonacci heaps have roughly the implementation complexity of a small country's tax code, which is why nobody uses them outside of theory papers.

The practical winner for sparse graphs is the lazy binary heap. Most standard library heaps (Python's heapq, Java's PriorityQueue, Go's container/heap) don't support decrease-key. The lazy workaround: push a new entry whenever you find a better edge, keep a visited set, and skip stale entries when you pop them. This costs O(E) space instead of O(V) but is straightforward and fast in practice. The code below uses this approach throughout.


What's in the Heap at Each Step

The algorithm needs three things: the graph, the priority queue, and the visited set.

Graph (adjacency list):
vertex 0 → [(1, 2), (3, 6)]
vertex 1 → [(0, 2), (2, 3), (3, 8), (4, 5)]
vertex 2 → [(1, 3), (4, 7)]
vertex 3 → [(0, 6), (1, 8), (4, 9)]
vertex 4 → [(1, 5), (2, 7), (3, 9)]

Min-heap entries: (weight, destination)
Initial: push (0, start_vertex)

Heap state at each step:
After popping (0,0):  heap = [(2,1), (6,3)]
After popping (2,1):  heap = [(3,2), (5,4), (6,3), (8,3)]
After popping (3,2):  heap = [(5,4), (6,3), (7,4), (8,3)]
After popping (5,4):  heap = [(6,3), (7,4)*, (8,3), (9,3)]
                                    * stale, 4 is visited
After popping (6,3):  heap = [(7,4)*, (8,3)*, (9,3)*]
                                    all stale, done

The visited array/set is the key to correctness. When you pop an entry from the heap, check if the destination is already in the tree. If it is, discard and continue. Stale entries accumulate like notifications you've decided to ignore. Harmless until the heap is empty.


One Structure, Every Language

All implementations use the same lazy binary heap approach. Input: an adjacency list where graph[u] is a list of (neighbor, weight) pairs. Output: the total MST weight, or -1 if the graph is disconnected.

The example graph: 5 vertices (0 to 4), edges as above. Expected output: 16.

import heapq def prim(graph: list[list[tuple[int, int]]]) -> int: n = len(graph) visited = [False] * n total = 0 edges_used = 0 heap = [(0, 0)] # (weight, vertex) while heap and edges_used < n: weight, u = heapq.heappop(heap) if visited[u]: continue visited[u] = True total += weight edges_used += 1 for neighbor, w in graph[u]: if not visited[neighbor]: heapq.heappush(heap, (w, neighbor)) return total if edges_used == n else -1 # Example graph = [ [(1, 2), (3, 6)], # vertex 0 [(0, 2), (2, 3), (3, 8), (4, 5)], # vertex 1 [(1, 3), (4, 7)], # vertex 2 [(0, 6), (1, 8), (4, 9)], # vertex 3 [(1, 5), (2, 7), (3, 9)], # vertex 4 ] print(prim(graph)) # 16

Lazy vs Eager: The Practical Tradeoff

Most implementations above use lazy deletion. When you discover a better edge to a vertex, you push the new entry without removing the old one. When you pop an already-visited vertex, you skip it. The name "lazy deletion" is accurate in the most affectionate way possible. You're not cleaning up after yourself, and it works out fine.

The eager version keeps exactly one entry per non-tree vertex in the heap, always the best known edge to that vertex. It uses an indexed priority queue that supports decrease-key. Space drops from O(E) to O(V). For very dense graphs with E >> V, this matters.

In practice, most standard heap libraries don't expose decrease-key, so lazy deletion dominates real code.

Lazy deletion vs eager indexed PQ: the lazy heap accumulates stale entries and skips them on pop; the eager heap keeps exactly one entry per vertex via decrease-key

Lazy: push duplicates, skip on pop, O(E) entries. Eager: one entry per vertex, requires decrease-key, O(V) entries. Both correct. One exists in your stdlib.

The non-obvious space pitfall: the lazy heap can hold up to O(E) entries, one for every edge in the graph. For a complete graph with V vertices that's O(V²) entries. If memory is tight and your graph is dense, use the O(V²) adjacency matrix version instead. It uses only O(V) extra space for the key array.


What Problems Prim's Solves

Prim's algorithm solves any problem that reduces to finding a minimum-weight spanning tree of an undirected graph. That category is broader than it looks.

Minimum cost connectivity. Connect all nodes (cities, servers, sensors, data centers) with the lowest total edge cost. This is the textbook application: telephone networks, power grids, road networks, fiber routing. If you can model the "must connect everything" requirement as an undirected graph with edge costs, you want an MST.

Minimum bottleneck spanning tree. Every MST is also a minimum bottleneck spanning tree. The MST minimizes the maximum edge weight on any path between two vertices. For any two vertices u and v, the path between them in the MST is the minimax path: the path whose heaviest edge is as light as possible. If you need to send data and want to minimize the worst link you pass through, the MST path is optimal.

Single-linkage clustering. Build the MST of your data points (with edge weights as pairwise distances), then remove the k-1 heaviest edges. You get k clusters. This MST-based approach runs in O(E log V) rather than the naive O(V² * k).

Approximate TSP. If edge weights satisfy the triangle inequality (metric TSP), doubling the MST and shortcutting produces a 2-approximation for the traveling salesman problem. Christofides' algorithm (a 1.5-approximation) starts from the MST. The MST is the foundation.

Network spanning tree protocol. Ethernet switches use Spanning Tree Protocol (STP/RSTP) to prevent loops in a network. The protocol builds a spanning tree of the switch graph with minimum-cost links active.

Image segmentation. Felzenszwalb-Huttenlocher segmentation (2004) builds an MST of a pixel graph and partitions it by removing edges that exceed an adaptive threshold. Used in computer vision pipelines.


Recognizing the Pattern

The signal that a problem wants an MST: you have a set of things, a cost to connect any two of them, and you want to connect all of them with minimum total cost.

More precisely, look for:

  • "Connect all N nodes with minimum cost" where connection is undirected and costs are pairwise. Direct translation to MST.
  • "Minimum cost such that all nodes are reachable from any node." Same thing.
  • "Minimize the maximum edge weight used." Minimum bottleneck spanning tree, which MST solves.
  • N items and you need exactly N-1 connections. That's a spanning tree by definition.

Things that look like MST but aren't: Steiner trees (you don't have to include all vertices, only a required subset), shortest paths between specific pairs (that's Dijkstra or APSP), minimum cost flows (that's a flow problem).

Worked Example 1: LeetCode 1584, Min Cost to Connect All Points

You're given n points in a 2D plane. The cost to connect two points is their Manhattan distance. Return the minimum cost to connect all points.

Why MST fits. You want to connect all n points. Connection is undirected (Manhattan distance is symmetric). You want minimum total cost. That's an MST. The graph isn't given explicitly: there are n*(n-1)/2 possible edges (every pair of points), so you build the adjacency list on the fly.

Prim's is the better choice here over Kruskal's. Kruskal's requires sorting all O(n²) edges first: O(n² log n). The O(n²) adjacency-matrix Prim's is actually optimal here because the graph is complete (E = O(V²), so E log V = V² log V, which is worse than just O(V²)).

import heapq def minCostConnectPoints(points: list[list[int]]) -> int: n = len(points) visited = [False] * n heap = [(0, 0)] # (cost, point_index) total = 0 connected = 0 while heap and connected < n: cost, u = heapq.heappop(heap) if visited[u]: continue visited[u] = True total += cost connected += 1 for v in range(n): if not visited[v]: dist = abs(points[u][0] - points[v][0]) + abs(points[u][1] - points[v][1]) heapq.heappush(heap, (dist, v)) return total

The inner loop pushes up to n edges per vertex, giving O(n²) heap entries and O(n² log n) time. For this problem that's fine because there's no way to do better than O(n²) since the input itself has n² pairs.

Worked Example 2: LeetCode 1135, Connecting Cities With Minimum Cost

You're given n cities (labeled 1 to n) and a list of connections: each connection is [city1, city2, cost]. Return the minimum cost to connect all cities, or -1 if it's impossible.

Why MST fits. Same structure: connect all n nodes, undirected edges with costs, minimize total. The edges are given explicitly this time, so Kruskal's (sort all edges, union-find) and Prim's (build adjacency list, heap) are both clean. For sparse input (few connections), both run in O(E log E). For dense input, prefer Prim's.

import heapq from collections import defaultdict def minimumCost(n: int, connections: list[list[int]]) -> int: graph = defaultdict(list) for u, v, cost in connections: graph[u].append((v, cost)) graph[v].append((u, cost)) visited = set() heap = [(0, 1)] # (cost, city), 1-indexed total = 0 while heap and len(visited) < n: cost, u = heapq.heappop(heap) if u in visited: continue visited.add(u) total += cost for neighbor, w in graph[u]: if neighbor not in visited: heapq.heappush(heap, (w, neighbor)) return total if len(visited) == n else -1

The pattern is identical to the base implementation. The only difference is 1-indexed vertices and the edge list format.


Prim's vs Kruskal's: Choose by Graph Density

Both algorithms find the MST. Your choice depends on the graph structure and the available data structures.

Prim'sKruskal's
ApproachVertex-centric, one growing componentEdge-centric, merging components
Best forDense graphsSparse graphs
Data structurePriority queueSort + Union-Find
Disconnected graphsGets partial tree, needs detectionNaturally gives spanning forest
Negative weightsWorks correctlyWorks correctly
ImplementationRequires heapRequires sorting + Union-Find

For dense graphs (E close to V²): use Prim's with a simple array and O(V²) time. The heap-based version's O(E log V) = O(V² log V) is actually slower. The Kruskal's article covers Union-Find in depth if you want to compare the two approaches directly.

For interview purposes, both algorithms are acceptable for most MST problems. Pick the one you can implement cleanly under pressure. Most engineers find Prim's slightly easier to reason about because it mirrors Dijkstra's structure.


Recap

  • Prim's grows one MST from a starting vertex by always picking the cheapest edge crossing the tree/non-tree boundary.
  • Correctness comes from the cut property: the minimum crossing edge for any cut belongs to some MST.
  • Time complexity: O(V²) with a simple array (dense graphs), O(E log V) with a binary heap (sparse graphs), O(E + V log V) with a Fibonacci heap (theoretically optimal).
  • Space complexity: O(E) for the lazy heap implementation, O(V) for the eager version or the matrix version.
  • Most standard library heaps don't support decrease-key, so use the lazy approach: push stale entries, skip visited vertices on pop.
  • Prim's works with negative edge weights. It does not work on disconnected graphs (you get a partial tree and must detect the failure).
  • Prefer Prim's over Kruskal's for dense graphs. For sparse graphs, both are O(E log V) and the choice doesn't matter much.
  • MST problems show up as: connect all nodes with minimum cost, minimize the maximum edge on any path, single-linkage clustering, TSP approximation.

If you want to practice explaining Prim's under interview pressure, not just implementing it, SpaceComplexity runs voice-based mock interviews with rubric feedback on exactly this kind of graph problem. The algorithm is easy to code quietly. Explaining why the cut property makes it correct while an interviewer listens is a different skill.

For more graph fundamentals, the graph data structure overview covers adjacency lists and BFS/DFS, and the Dijkstra's algorithm deep dive walks through the closely related single-source shortest path algorithm.


Further Reading