What Is a Spanning Tree? The Graph Concept Behind MST Problems

Most engineers can recite the spanning tree definition on cue. Ask them to build one from scratch, or explain why greedy works, and you get the same look: a thousand-yard stare borrowed from someone who just realized their flight was overbooked.

Your Graph Has Redundant Edges. A Spanning Tree Removes Them.

Take a connected, undirected graph. It has more edges than it needs. Some edges are redundant in the sense that removing them leaves the graph still connected. A spanning tree keeps exactly enough edges to hold every vertex together, and no more.

A spanning tree of a graph G with vertex set V is a subgraph that:

Includes every vertex in V
Is connected
Contains no cycles
Has exactly V - 1 edges

That last point follows from the other three. A tree with n nodes always has n - 1 edges. Add one more and you introduce a cycle. Remove any existing edge and the tree splits.

Four cities, five roads between them:

    A
   /|\
  B | D
   \|/
    C

Edges: A-B, A-C, A-D, B-C, C-D. Any spanning tree needs exactly three edges (V - 1 = 3).

Valid spanning trees:

A-B, A-C, A-D (star from A)
A-B, B-C, C-D (path through the graph)
A-D, D-C, C-B (another path)

None is more correct than the others. All are connected, acyclic, four vertices, three edges.

Why V - 1 Is Exact, Not a Coincidence

Start with n isolated vertices and zero edges. You have n separate components. Each edge you add can reduce the component count by at most one. If both endpoints are already in the same component, the edge creates a cycle instead.

To go from n components down to one, you need exactly n - 1 merges. Add one more edge and you get a cycle. Remove any existing edge and you get two components. V - 1 is a knife edge.

This invariant comes up constantly in interview problems. "Determine whether a graph is a tree" reduces to: exactly V - 1 edges and connected? Two conditions. You can check them in one pass. Don't overthink it.

Most Graphs Have Way Too Many Spanning Trees

A complete graph on n vertices has n^(n-2) spanning trees, a result known as Cayley's formula. For n = 4: 4^2 = 16. For n = 6: 6^4 = 1,296. For n = 10: 10^8 = 100 million.

This abundance is why the minimum spanning tree problem is interesting. Among all spanning trees of a weighted graph, which one has the smallest total edge weight? Every MST algorithm is an answer to that question.

Finding Any Spanning Tree: BFS and DFS Are Enough

You don't need Kruskal's or Prim's to find a spanning tree. BFS and DFS both produce one naturally.

from collections import deque

def bfs_spanning_tree(graph: dict[str, list[str]], start: str) -> list[tuple[str, str]]:
    visited = {start}
    queue = deque([start])
    tree_edges = []

    while queue:
        node = queue.popleft()
        for neighbor in graph[node]:
            if neighbor not in visited:
                visited.add(neighbor)
                tree_edges.append((node, neighbor))
                queue.append(neighbor)

    return tree_edges


graph = {
    "A": ["B", "C", "D"],
    "B": ["A", "C"],
    "C": ["A", "B", "D"],
    "D": ["A", "C"],
}

print(bfs_spanning_tree(graph, "A"))
# [('A', 'B'), ('A', 'C'), ('A', 'D')]

Every time BFS visits a new vertex through an edge, that edge goes into the spanning tree. Edges leading to already-visited vertices are discarded, which prevents cycles.

BFS produces wide, shallow trees. DFS produces tall, narrow ones. Neither gives the minimum-weight spanning tree. For that you need weights. And when weights show up, the nice easy BFS code above stops being enough.

Groot staring blankly: "My face when the interviewer asks me what sync.Pool is actually doing internally in Go and how it interacts with the garbage collector but all I know is how to reverse a linked list"

The interview is going fine until "explain why greedy works."

The Minimum Spanning Tree: Weights Change Everything

When edges have weights, spanning trees stop being equally good. Some are objectively worse.

    A
   /|\
  5 3 7
 /  |  \
B   |   D
 \  |  /
  4 | 2
   \|/
    C

Edges and weights: A-B=5, A-C=3, A-D=7, B-C=4, C-D=2.

Spanning tree A-C (3), C-D (2), B-C (4): total weight 9. Spanning tree A-C (3), C-D (2), A-B (5): total weight 10. Spanning tree A-B (5), B-C (4), C-D (2): total weight 11.

The MST is A-C, C-D, B-C with total weight 9.

MSTs have a property called the cut property: for any partition of vertices into two groups, the cheapest edge crossing that partition is guaranteed to be in the MST. This is the correctness argument behind every greedy MST algorithm. Memorize this sentence. An interviewer who asks "why does greedy work?" wants exactly this.

Kruskal's and Prim's: Same Goal, Different Approach

Both find the MST. They just think about the problem differently.

Kruskal's sorts all edges by weight, then greedily adds the cheapest edge that doesn't create a cycle. It builds the MST by merging components bottom-up. Cycle detection is handled by Union-Find, which answers "are these two vertices in the same component?" in near-constant time.

Prim's grows the MST outward from a single starting vertex. At each step, it picks the cheapest edge connecting the current tree to a vertex not yet in the tree. A greedy expansion driven by a min-heap.

Kruskal's is the default first choice in interviews. Simpler to code, the Union-Find cycle check is a single function, and the edge-sorting step is obvious. Prim's edges ahead on dense graphs (where E approaches V^2), but dense graphs are rare in interview problems. Pick Kruskal's, implement it cleanly, explain the cut property, go home.

For implementation details and full complexity analysis, see the dedicated guides on Kruskal's algorithm and Prim's algorithm.

Split image: left side glowing "Looking at your 'beautiful' code at 3am" vs right side confused "Looking at the exact same code after you wake up"

Greedy feels obviously correct in the moment. The cut property is why it actually is.

The Properties Interviewers Actually Test

Uniqueness. If all edge weights are distinct, the MST is unique. If weights repeat, multiple MSTs may share the same total weight, and any valid one is acceptable.

Cycle property. The maximum-weight edge in any cycle is never part of the MST. Remove it and the remaining cycle edges still provide connectivity at lower cost.

MST is not a shortest-path tree. This trips people up constantly. Dijkstra's gives the shortest path from a source to every other vertex. The MST minimizes total edge weight across the whole tree, not path lengths from a source. Different problem, different algorithm. Confusing the two in an interview is the graph theory equivalent of answering a question about TCP when the interviewer asked about UDP.

Spanning forests. If the graph is disconnected, you get a spanning forest: one spanning tree per connected component.

Where Spanning Trees Show Up in Interviews

The tell: connected graph, weighted edges, connect everything with minimum total cost.

"Minimum Cost to Connect All Points" (LeetCode 1584) is a direct MST problem. The "graph" is a set of 2D points, edge weight is Manhattan distance, and you need the MST of that complete graph. Prim's works well here because the graph is dense.

"Is this graph a tree?" Check: exactly V - 1 edges and connected. If yes, it's already a spanning tree of itself.

Network design problems ("cheapest way to wire all offices") are MST problems with a business label. The structure is identical.

Clustering. Build the MST of a point cloud and remove the k - 1 most expensive edges. You get k clusters. Single-linkage hierarchical clustering works exactly this way.

Explaining why the greedy cut property guarantees correctness is what separates strong-hire answers from passable ones. SpaceComplexity runs voice-based mock interviews where the follow-ups go exactly there: "why does greedy work?" and "how do you know this is optimal?"

Complexity at a Glance

Approach	Time	Space
BFS spanning tree	O(V + E)	O(V)
DFS spanning tree	O(V + E)	O(V)
Kruskal's MST	O(E log E)	O(V + E)
Prim's MST (binary heap)	O(E log V)	O(V + E)

The O(E log E) sorting step in Kruskal's dominates. Since E < V^2, log E < 2 log V, so O(E log E) and O(E log V) differ by only a constant factor in practice. The asymptotic difference between the two algorithms is smaller than it looks on paper.

Key Takeaways

A spanning tree uses all V vertices, exactly V - 1 edges, and no cycles.
BFS and DFS find a spanning tree in O(V + E). They don't find the minimum one.
The cut property is why greedy works: the cheapest edge crossing any cut is always in the MST.
MST and shortest-path tree are different things with different algorithms.
"Connect all nodes with minimum cost" means MST. "Shortest path from A to everywhere" means Dijkstra's.

For graph representation and traversal algorithms, see Graph Data Structure Interview and Union-Find Algorithm.