What Is a Minimum Spanning Tree? Graph Connectivity at Minimum Cost

You have five cities and a budget problem. Each pair of cities has a road, but roads cost money to maintain. You want every city reachable from every other. You also don't want to pay for roads you don't need.

That's the minimum spanning tree problem. It shows up in interviews more often than people expect, and it has a clean, almost suspiciously elegant solution.

Looking for minimum spanning tree be like

Your face when you see "connect all N cities at minimum cost" and you know exactly what to do.

Spanning Trees Come First

Before "minimum," understand "spanning tree."

A spanning tree of a connected undirected graph G with V vertices is a subgraph that includes every vertex, has no cycles, and stays connected. A spanning tree always has exactly V-1 edges. One fewer edge and the graph disconnects. One more and you've created a cycle somewhere.

Most graphs have many spanning trees. A complete graph on 4 vertices has 16 of them. A complete graph on 10 vertices has 10^8. You want the one where the total edge weight is as small as possible.

One important edge case: if all edge weights are distinct, the MST is unique. If some weights tie, multiple spanning trees can share the same minimum total weight. All of them count.

A Concrete Example

Five cities: A, B, C, D, E. Seven edges:

Edge	Weight
A-B	4
A-C	2
B-C	1
B-D	5
C-D	8
C-E	10
D-E	2

Any spanning tree needs exactly 4 edges (V-1 = 5-1).

The five-city graph with MST edges highlighted in blue

Blue edges form the MST (total weight: 10). The gray dashed edges either create cycles or cost more than the path you already have.

Work through it greedily. Cheapest edge: B-C at weight 1. Take it. Next cheapest: A-C and D-E, both weight 2. Take both. Three edges in. Next: A-B at weight 4. A and B are already connected through C, so taking this edge creates a cycle. Skip it. Next: B-D at weight 5. B and D are in different components. Take it.

Four edges, all five vertices connected. Total: 1 + 2 + 2 + 5 = 10.

That exact process, "take the cheapest edge that doesn't form a cycle," is Kruskal's algorithm.

Being Greedy Actually Works Here

You skipped A-B (weight 4) and never came back. How do you know the cheapest complete solution didn't need it?

Two properties make the greedy approach provably correct:

Cut property. Divide the vertices into any two non-empty groups. The minimum weight edge crossing between those groups belongs to some MST. This is why you can always extend a partial MST by grabbing the cheapest edge that crosses any partition.

Cycle property. In any cycle in the graph, the maximum weight edge does not appear in any MST. You can always remove it and stay connected via an alternate path. This is why skipping A-B was safe: it closed a cycle, which makes it the maximum edge in that cycle, and maximum cycle edges never end up in an MST.

These two properties are the reason greedy works here when it fails so often elsewhere. No backtracking. No regret. Math says the first cheapest choice is always part of some optimal solution.

Two Algorithms, One MST

Both algorithms find the MST. They just have different working styles.

Kruskal's algorithm sorts all edges by weight and processes them in order, adding each edge if it connects two separate components. It uses Union-Find to detect cycles in near-constant time. Total time: O(E log E), dominated by sorting. The easier one to write from scratch in an interview under pressure.

Prim's algorithm grows a single connected component starting from any vertex. At each step it adds the minimum weight edge connecting the current tree to any vertex not yet included. It uses a min-heap to find that edge efficiently. Total time: O(E log V) with a binary heap. If you've memorized Dijkstra's, Prim's will look familiar because the structure is nearly identical.

For sparse graphs (E close to V), both are similar. For dense graphs (E close to V²), Prim's with an adjacency matrix runs in O(V²), which matches the input size and beats the heap version.

But both answers are correct

Kruskal's sorts edges and sweeps. Prim's grows a tree outward. They arrive at the exact same MST. Both correct. Both annoying.

See Kruskal's algorithm and Prim's algorithm for full implementations in every major language. The comparison of when to prefer each lives in Kruskal's vs Prim's.

Time and Space Complexity

Algorithm	Time	Space
Kruskal's	O(E log E)	O(V + E)
Prim's (binary heap)	O(E log V)	O(V + E)
Prim's (adjacency matrix)	O(V²)	O(V²)

The MST itself stores V-1 edges, so O(V) space for the result.

In practice, O(E log E) and O(E log V) are the same for most graphs because log E ≤ 2 log V (since E ≤ V²). Kruskal's constant factors tend to be smaller for sparse inputs.

When MST Shows Up in Interviews

The signal: "connect all N nodes at minimum total cost." That phrase almost always means MST. When you hear it, you should already be thinking Kruskal's.

Common problems:

LeetCode 1135, Connecting Cities With Minimum Cost. Direct Kruskal's application. You get an edge list. Find the MST weight, or return -1 if the graph is disconnected.

LeetCode 1584, Min Cost to Connect All Points. Points on a 2D plane, cost is Manhattan distance between any pair. There are O(V²) implicit edges. Use Prim's greedily (no need to enumerate all edges upfront) or generate all edges and sort for Kruskal's.

LeetCode 1168, Optimize Water Distribution in a Village. The clever variant. Each village can drill a well at some cost, or connect to a neighbor via pipe. The trick: add a virtual water source vertex connected to every village at its well-drilling cost. Now the whole problem is standard MST. If you can reduce a novel problem to MST by adding a virtual node, that's a signal to interviewers that you own the concept. Explaining why the reduction works before coding is usually worth more than the code itself.

Reach for Kruskal's when:

Edge list is given explicitly
Graph is sparse
You want simpler code

Reach for Prim's when:

Graph is dense or given as an adjacency matrix
You need to generate edges on the fly (like problem 1584)
You already have Dijkstra's and want to adapt it

My interviewer when I don't know the right algorithm

"They literally said minimum cost to connect all the cities." The signal is not subtle.

One Algorithm, Four Languages

Kruskal's with Union-Find path compression and union by rank. This is the version worth memorizing.

# Python 3
def minimum_spanning_tree(n: int, edges: list[tuple[int, int, int]]) -> int:
    parent = list(range(n))
    rank = [0] * n

    def find(x: int) -> int:
        while parent[x] != x:
            parent[x] = parent[parent[x]]
            x = parent[x]
        return x

    def union(x: int, y: int) -> bool:
        px, py = find(x), find(y)
        if px == py:
            return False
        if rank[px] < rank[py]:
            px, py = py, px
        parent[py] = px
        if rank[px] == rank[py]:
            rank[px] += 1
        return True

    edges.sort(key=lambda e: e[2])
    total = 0
    edges_used = 0

    for u, v, w in edges:
        if union(u, v):
            total += w
            edges_used += 1
            if edges_used == n - 1:
                break

    return total if edges_used == n - 1 else -1

// TypeScript
function minimumSpanningTree(n: number, edges: [number, number, number][]): number {
    const parent = Array.from({ length: n }, (_, i) => i);
    const rank = new Array(n).fill(0);

    function find(x: number): number {
        while (parent[x] !== x) {
            parent[x] = parent[parent[x]];
            x = parent[x];
        }
        return x;
    }

    function union(x: number, y: number): boolean {
        const px = find(x);
        const py = find(y);
        if (px === py) return false;
        if (rank[px] < rank[py]) {
            parent[px] = py;
        } else if (rank[px] > rank[py]) {
            parent[py] = px;
        } else {
            parent[py] = px;
            rank[px]++;
        }
        return true;
    }

    edges.sort((a, b) => a[2] - b[2]);
    let total = 0;
    let edgesUsed = 0;

    for (const [u, v, w] of edges) {
        if (union(u, v)) {
            total += w;
            edgesUsed++;
            if (edgesUsed === n - 1) break;
        }
    }

    return edgesUsed === n - 1 ? total : -1;
}

// Java
import java.util.Arrays;

class Solution {
    private int[] parent;
    private int[] rank;

    private int find(int x) {
        while (parent[x] != x) {
            parent[x] = parent[parent[x]];
            x = parent[x];
        }
        return x;
    }

    private boolean union(int x, int y) {
        int px = find(x);
        int py = find(y);
        if (px == py) return false;
        if (rank[px] < rank[py]) {
            int tmp = px; px = py; py = tmp;
        }
        parent[py] = px;
        if (rank[px] == rank[py]) rank[px]++;
        return true;
    }

    public int minimumSpanningTree(int n, int[][] edges) {
        parent = new int[n];
        rank = new int[n];
        for (int i = 0; i < n; i++) parent[i] = i;

        Arrays.sort(edges, (a, b) -> a[2] - b[2]);
        int total = 0;
        int edgesUsed = 0;

        for (int[] edge : edges) {
            if (union(edge[0], edge[1])) {
                total += edge[2];
                edgesUsed++;
                if (edgesUsed == n - 1) break;
            }
        }

        return edgesUsed == n - 1 ? total : -1;
    }
}

// C++
#include <algorithm>
#include <numeric>
#include <vector>

class Solution {
    std::vector<int> parent, rank_;

    int find(int x) {
        while (parent[x] != x) {
            parent[x] = parent[parent[x]];
            x = parent[x];
        }
        return x;
    }

    bool unite(int x, int y) {
        int px = find(x), py = find(y);
        if (px == py) return false;
        if (rank_[px] < rank_[py]) std::swap(px, py);
        parent[py] = px;
        if (rank_[px] == rank_[py]) rank_[px]++;
        return true;
    }

public:
    int minimumSpanningTree(int n, std::vector<std::vector<int>>& edges) {
        parent.resize(n);
        rank_.assign(n, 0);
        std::iota(parent.begin(), parent.end(), 0);

        std::sort(edges.begin(), edges.end(),
            [](const auto& a, const auto& b) { return a[2] < b[2]; });

        int total = 0, edgesUsed = 0;
        for (auto& edge : edges) {
            if (unite(edge[0], edge[1])) {
                total += edge[2];
                if (++edgesUsed == n - 1) break;
            }
        }

        return edgesUsed == n - 1 ? total : -1;
    }
};

How to Talk Through MST in an Interview

The moment you hear "connect all nodes at minimum cost," say it out loud: "This looks like a minimum spanning tree. I'll use Kruskal's with Union-Find."

Then walk through the structure:

Sort edges by weight.
Greedily include the cheapest edge that connects two separate components.
Stop when you have V-1 edges.
If you process all edges without reaching V-1, the graph is disconnected and no MST exists.

Always check the disconnected case. Return -1 or handle it explicitly. Missing this is one of the most common oversights interviewers notice, and it costs points in testing dimensions.

For Prim's, the framing shifts slightly: "Start from any vertex. At each step, extend the tree by the cheapest edge that reaches a new vertex." This is Dijkstra's with edge weights replacing distances, a useful framing if the interviewer probes for connections between algorithms.

For problem 1168, walk through the virtual node reduction before you write a single line of code. Why does adding a fake water source work? Because every village's well-drilling cost becomes just another edge, and now you have a standard MST problem. Explaining the insight earns more signal than the implementation.

The verbal explanation is the part grinding LeetCode doesn't train. SpaceComplexity runs realistic voice-based mock interviews with rubric scoring on exactly this kind of communication, so you can sharpen it before the real thing.

Key Takeaways

A spanning tree uses all V vertices, has exactly V-1 edges, and has no cycles.
If all edge weights are distinct, the MST is unique.
Kruskal's (sort edges + Union-Find): O(E log E). Best for sparse graphs and interview implementations.
Prim's (min-heap): O(E log V). Better for dense graphs or when edges are generated on the fly.
The interview signal: "connect all N points at minimum total cost."
Always handle the disconnected case. Return -1 if you finish with fewer than V-1 edges.