Maximum Bipartite Matching: Hopcroft-Karp Finds It in O(E√V)

You have 200 nurses and 200 shifts. Each nurse lists the shifts she can cover. You want to fill as many shifts as possible without double-booking anyone. Trying every possible assignment is astronomical. A greedy heuristic gets stuck. You flail for a bit. You invent some heuristic. The heuristic is wrong.

You have a maximum bipartite matching problem. Hopcroft-Karp solves it in O(E√V). That's fast enough that real scheduling systems actually use it, which means somewhere a hospital scheduler is sleeping soundly because two people wrote a paper in 1973.

Two Groups, Edges Between Them, Maximum Pairings

A bipartite graph is a graph whose vertices split into two disjoint sets, call them L (left) and R (right), such that every edge crosses between the two sides. No edges within L. No edges within R. Think of it as a strict rule at the party: left-siders only talk to right-siders.

A matching is a subset of edges where no two edges share a vertex. A maximum matching is the largest matching the graph can hold.

The mental model: L is workers, R is tasks, edges represent "this worker can do this task." Maximum matching = maximum number of tasks you can simultaneously assign without overloading anyone.

Bipartite graph with matched and unmatched edges. Blue edges are in the matching M; dashed edges are available but unused.

Matched edges in blue, available edges dashed. No vertex touches more than one matched edge.

Augmenting Paths: The Mechanism Under Every Matching Algorithm

Suppose you have a matching M. An alternating path is a path that alternates between edges not in M and edges in M. An augmenting path is an alternating path that starts and ends at unmatched vertices.

If you find an augmenting path, you can grow the matching by one. Flip every edge along the path: edges in M leave M, edges not in M join M. Because the path started and ended at unmatched vertices, you gain exactly one new matched pair. It sounds like a shell game. It is, a little. But it works.

Berge's theorem (1957): A matching M is maximum if and only if there is no M-augmenting path.

The proof in both directions is short. If an augmenting path exists, flipping it gives a larger matching, so M was not maximum. Conversely, if M is not maximum, let M* be a larger matching. Look at the symmetric difference M △ M* (edges in exactly one of the two matchings). This subgraph decomposes into alternating paths and cycles. Since |M*| > |M|, at least one component must be a path with more M* edges than M edges, and such a path is an M-augmenting path.

This theorem is the foundation of every practical matching algorithm. The structure is always: find augmenting paths, flip them, repeat until none remain.

Before and after flipping an augmenting path C-2-B-3. The matching grows from size 2 to size 3.

The green path is augmenting. Flip it and you gain one free pair. Every matching algorithm does exactly this.

The Naive Approach: One Path at a Time

The simplest implementation: start with an empty matching. For each unmatched left vertex u, run a DFS to find an augmenting path. If one is found, flip it. Repeat until no free left vertex finds a path.

Each DFS visits O(E) edges in the worst case. You repeat the outer loop O(V) times, once per unit increase in the matching size. Total: O(V · E). For dense graphs where E = Θ(V²), that's O(V³).

For graphs with a few hundred vertices, this is fine. For anything bigger, you want Hopcroft-Karp.

How Hopcroft-Karp Works

The insight: instead of augmenting along one path per iteration, find a maximal set of vertex-disjoint shortest augmenting paths and augment along all of them simultaneously.

Finding k paths at once costs the same O(E) as finding one. Augmenting along a maximal set each phase forces the next phase's shortest augmenting path to be strictly longer. After O(√V) such phases, the work is done. This is the kind of trick that makes you stare at the proof for ten minutes and then say "that's it?"

The algorithm runs in two repeated phases.

Phase 1: BFS Builds a Layered Graph

Start a BFS simultaneously from all free (unmatched) left vertices. These are layer 0. If you need a refresher on BFS vs DFS and when each traversal applies, BFS vs DFS: One Question Decides It Every Time covers the tradeoffs in depth.

From each layer-0 vertex, follow unmatched edges to right vertices. From each right vertex, follow the matched edge back to the left side. Alternate: unmatched edge (going right), matched edge (going left), unmatched edge (going right).

Stop the BFS when you first reach a free right vertex. Every free right vertex at that "stopping distance" is a valid endpoint of a shortest augmenting path. The BFS also records, for each left vertex, the distance at which it appears in this layered graph.

If the BFS finds no free right vertex at all, the matching is already maximum. Stop.

The BFS layered graph. Layer 0 is free left vertices. Layer 1 is right vertices reached via unmatched edges. Layer 2 is left vertices pulled back via matched edges. Layer 3 is free right vertices where BFS stops.

The BFS fans out in alternating edge directions. The moment it hits a free right vertex, it records that distance and stops expanding. That distance is the length of the shortest augmenting path this phase.

Phase 2: DFS Augments All Shortest Paths at Once

Run a DFS from each free left vertex, following the layered structure (only move to a right vertex's matched partner if that partner is one layer deeper). When you reach a free right vertex, you have found an augmenting path. Flip it.

The key implementation trick: after a successful augmentation through a left vertex w, set dist[w] = INF. This marks w as "used." Any other DFS path that would have gone through w will skip it. This guarantees that all paths augmented in one phase are vertex-disjoint, and that each vertex is visited at most once per phase.

After each DFS phase, go back to the BFS. The matching has grown. Repeat until BFS finds no free right vertex.

What Each Operation Costs

Operation	Time	Space
`add_edge(u, v)`	O(1)	O(1) amortized
`bfs()` (one phase)	O(V + E)	O(V)
`dfs()` (one phase, all starts)	O(V + E)	O(V) call stack
`max_matching()` (complete run)	O(E √V)	O(V + E)

Space includes the adjacency list (O(E)), the two match arrays (O(V)), and the dist array (O(V)).

Why O(E√V) Holds

Each phase costs O(E). The BFS visits each edge at most once: O(E). The DFS, across all starting vertices in one phase, also visits each edge at most once (because the dist[u] = INF trick prevents re-entry into any left vertex after augmentation). So one phase is O(V + E) = O(E) when E ≥ V.

The number of phases is O(√V). This is the hard part, and it comes down to two lemmas.

Lemma 1 (path length increases): Let L be the length of the shortest augmenting path after phase k. After phase k+1, the shortest augmenting path has length strictly greater than L.

Why? In phase k we found a maximal set of vertex-disjoint shortest augmenting paths of length L and augmented all of them. For a path P of length L to still exist in the updated matching, P must "cross" at least one of those augmented paths (since they were maximal). Crossing creates a detour, so P would need to be longer than L. Formally, any M-augmenting path with respect to the new matching contains at least one vertex from the old augmented paths, and threading through that vertex adds at least 2 to the length.

Lemma 2 (bounded remaining work): After √V phases, the shortest augmenting path has length > √V. The remaining deficit is |M*| - |M|, the number of augmenting paths needed to reach a maximum matching. Each remaining augmenting path has length > √V and uses > √V vertices. Since augmenting paths must be vertex-disjoint and the total number of vertices is V, the number of remaining augmenting paths is less than V / √V = √V. So fewer than √V more phases remain.

Total phases: at most 2√V. Each phase: O(E). Grand total: O(E√V).

For dense bipartite graphs where E = Θ(V²), Hopcroft-Karp runs in O(V^2.5). The original 1973 paper titled it "An n^5/2 algorithm for maximum matchings in bipartite graphs" with n being the number of vertices, which matches: V · V^(1/2) = V^(5/2).

Three Theorems You Get for Free

Once you have a maximum matching, three more results fall out at no extra cost. These are not bonus features. They are load-bearing theorems that make bipartite matching useful in completely different-looking problems.

Berge (1957): Already covered. The matching is maximum because no augmenting path remains after the algorithm terminates.

König's theorem (1931): In any bipartite graph, the size of the maximum matching equals the size of the minimum vertex cover. A vertex cover is a set of vertices that touches every edge. König's theorem says the two extremes are equal in bipartite graphs. This does not hold for general graphs (where minimum vertex cover is NP-hard). The construction: given the maximum matching M, run an alternating BFS/DFS from free left vertices. The minimum vertex cover consists of matched left vertices NOT reachable by this traversal, plus matched right vertices that ARE reachable. Complement of this cover gives the maximum independent set (vertices that share no edge), which has size V minus the matching size.

Hall's marriage theorem (1931): A bipartite graph G = (L ∪ R, E) has a perfect matching (every left vertex matched) if and only if for every subset S ⊆ L, the neighborhood N(S) satisfies |N(S)| ≥ |S|. Every set of left vertices must collectively connect to at least as many right vertices. Useful for proving a perfect matching exists (or proving it does not, via a violating set S).

The Layered Graph, Traced

Here is what the BFS builds for a small example. Left vertices are A, B, C. Right vertices are 1, 2, 3. Current matching: A-1, B-2. Free left vertex: C. Free right vertex: 3.

BFS layers (augmenting path search):

  Layer 0: C           <- free left vertex, starting point
               \
                -> 2    <- unmatched edge C->2 (layer 1, right)
               /
  Layer 2: B           <- B is matched to 2, so follow match back
               \
                -> 3    <- unmatched edge B->3 (layer 3, right)

Free right vertex 3 found at layer 3.
Augmenting path: C -> 2 -> B -> 3
After flip: C-2, B-3 matched. A-1 unchanged. Full matching of size 3.

The BFS stops as soon as it hits a free right vertex. All paths of the same layer distance are explored in the subsequent DFS phase.

One Structure, Every Language

The implementation has four parts: an adjacency list for left vertices, a matchL array (left vertex to its matched right vertex), a matchR array (right vertex to its matched left vertex), and a dist array for the BFS layering. Unmatched is represented by -1 (or None / null in some languages). INF is any value larger than the maximum possible matching length.

from collections import deque

INF = float('inf')

class HopcroftKarp:
    def __init__(self, n: int, m: int):
        self.n = n
        self.adj = [[] for _ in range(n)]
        self.match_l: list[int] = [-1] * n
        self.match_r: list[int] = [-1] * m
        self.dist: list[float] = [0.0] * n

    def add_edge(self, u: int, v: int) -> None:
        self.adj[u].append(v)

    def _bfs(self) -> bool:
        queue: deque[int] = deque()
        for u in range(self.n):
            if self.match_l[u] == -1:
                self.dist[u] = 0
                queue.append(u)
            else:
                self.dist[u] = INF
        found = False
        while queue:
            u = queue.popleft()
            for v in self.adj[u]:
                w = self.match_r[v]
                if w == -1:
                    found = True
                elif self.dist[w] == INF:
                    self.dist[w] = self.dist[u] + 1
                    queue.append(w)
        return found

    def _dfs(self, u: int) -> bool:
        for v in self.adj[u]:
            w = self.match_r[v]
            if w == -1 or (self.dist[w] == self.dist[u] + 1 and self._dfs(w)):
                self.match_l[u] = v
                self.match_r[v] = u
                return True
        self.dist[u] = INF
        return False

    def max_matching(self) -> int:
        result = 0
        while self._bfs():
            for u in range(self.n):
                if self.match_l[u] == -1 and self._dfs(u):
                    result += 1
        return result

The One Bug That Kills It

The dist[u] = INF assignment at the end of the DFS is not optional. That single line is what prevents a left vertex from being visited twice in one phase. Without it, two separate DFS trees can both try to route through the same vertex w, one of them augments through it, and now the other one is following a path that no longer exists. The DFS degrades to O(V · E) per phase, blowing up the total complexity to O(V · E · √V). This is worse than the naive approach.

Without dist[u]=INF, two DFS paths visit the same vertex. With it, the second path sees INF and skips cleanly. Each vertex is visited at most once per phase.

Left: without the trick, vertex w is explored twice. Right: after P1 augments through w and sets dist[w]=INF, P2 hits INF and skips. One line. O(E) per phase preserved.

Every implementation above places this line after the neighbor loop, once no valid augmentation was found. Do not move it to the top of the function. Do not remove it.

Similarly: the BFS sets dist[u] = INF for matched left vertices, not 0. Only free left vertices get distance 0. Swapping this wrecks the layered structure.

What Problems Does Bipartite Matching Solve?

Assignment problems: Workers to tasks, students to schools, applicants to positions. Any time you have two groups and one-to-one assignment with capacity 1 on both sides.

Minimum vertex cover in bipartite graphs: By König's theorem, the minimum vertex cover size equals the maximum matching size. The construction of the actual cover from the matching runs in O(V + E).

Maximum independent set in bipartite graphs: Size = V minus the maximum matching size. Free corollary of König's theorem.

Network flow (as a special case): A bipartite matching is equivalent to max-flow in a unit-capacity network: add a source connected to all left vertices, a sink connected from all right vertices, set all capacities to 1. Hopcroft-Karp is faster than general max-flow for this structure because it exploits the bipartite constraint. The graph data structure interview guide covers adjacency list representation and graph traversal patterns if you need to build that foundation first.

Stable matching (Gale-Shapley): A different variant with preference orderings. Hopcroft-Karp finds maximum cardinality; Gale-Shapley finds a stable matching. Related but different problems.

LeetCode 1349 (Maximum Students Taking Exam): Students can sit in seats such that no two students sharing a row can cheat from each other (same row adjacency) and no diagonal adjacency across rows. Model odd-column seats as one side, even-column seats as the other. Adjacent seats that would allow cheating become edges. Maximum independent set in the resulting bipartite graph = V minus max matching.

When Do You Need a Bipartite Matching Algorithm?

Five signals that a problem calls for bipartite matching:

Two explicit groups with cross-edges. "Workers and tasks," "students and courses," "servers and requests." No edges within a group.
Each item can appear in at most one pairing. Not multi-assignment, not covering. One edge per vertex.
Maximize the count of successful pairings. Not the weight, not the cost, just the count. (Weighted bipartite matching is the Hungarian algorithm, a different beast.)
A graph-coloring argument reveals bipartite structure. The problem talks about a grid or graph, and a 2-coloring (checkerboard, odd/even, layer parity) separates the vertices cleanly.
The complement is a minimum vertex cover or maximum independent set. Problems framed as "what is the maximum set of vertices with no two adjacent?" on a bipartite graph.

Worked Example 1: Course Scheduling

Problem: There are N professors and M classrooms. Each professor has a list of classrooms they are willing to teach in. Assign as many professors to classrooms as possible, with each professor in at most one classroom and each classroom hosting at most one professor.

Why bipartite matching: Professors are left vertices, classrooms are right vertices, "willing to teach in" defines the edges. No professor-to-professor or classroom-to-classroom edges exist. We want the maximum set of non-conflicting assignments. Direct application of maximum bipartite matching.

Setup: Create a HopcroftKarp with n = number of professors, m = number of classrooms. For each (professor, classroom) pair where the professor is willing, call add_edge. Call max_matching.

def max_assignments(n_professors, n_classrooms, willing):
    hk = HopcroftKarp(n_professors, n_classrooms)
    for prof, room in willing:
        hk.add_edge(prof, room)
    return hk.max_matching()

Worked Example 2: Grid Coloring with Forbidden Adjacency

Problem (LC 1349 variant): An n x m grid of seats. Some seats are broken (unavailable). Two students cannot sit in adjacent seats in the same row. Maximize the number of students seated.

Why bipartite matching: Color the grid like a checkerboard. "Black" seats form one side, "white" seats form the other side. Two students conflict only if they sit in adjacent seats within the same row, which always connects a black seat to a white seat (in a standard row, columns alternate odd/even). The conflict graph is bipartite. Maximum independent set in this graph = total available seats minus max matching. Use König's theorem.

def max_students(seats):
    rows, cols = len(seats), len(seats[0])
    def seat_id(r, c):
        return r * cols + c

    available = set()
    for r in range(rows):
        for c in range(cols):
            if seats[r][c] == '.':
                available.add(seat_id(r, c))

    n = rows * cols
    hk = HopcroftKarp(n, n)
    for r in range(rows):
        for c in range(cols):
            if seat_id(r, c) not in available:
                continue
            if c % 2 == 0:
                for nc in [c - 1, c + 1]:
                    if 0 <= nc < cols and seat_id(r, nc) in available:
                        hk.add_edge(seat_id(r, c), seat_id(r, nc))

    return len(available) - hk.max_matching()

Practice This on SpaceComplexity

Graph algorithms are notoriously hard to explain under pressure. Bipartite matching adds the extra challenge of explaining the layered BFS structure, the augmenting path argument, and the König's theorem corollary all in the same breath. "So you do BFS and then DFS and then the dist thing and..." is not a complete answer.

If you want realistic practice explaining matching algorithms in a voice-based mock interview with rubric feedback, SpaceComplexity runs the full simulation. You will not be comfortable with this material until you have explained it out loud at least twice.

Recap

A bipartite graph partitions vertices into two sides with edges only between them.
A matching is a set of edges with no shared endpoints. Maximum matching = largest such set.
Berge's theorem: A matching is maximum if and only if no augmenting path exists.
The naive augmenting-path algorithm runs in O(V · E). Hopcroft-Karp runs in O(E√V).
Hopcroft-Karp alternates between BFS (builds a layered graph from all free left vertices) and DFS (augments a maximal set of vertex-disjoint shortest paths).
dist[u] = INF in the DFS ensures each left vertex is visited at most once per phase, keeping each phase O(E).
The number of phases is O(√V): after √V phases the shortest augmenting path length exceeds √V, and the remaining deficit is less than √V paths.
König's theorem: max matching = min vertex cover in bipartite graphs. Max independent set = V minus max matching.
Five recognition signals: two explicit groups, each item in at most one pairing, maximize the count, graph is 2-colorable, complement is a vertex cover or independent set question.
For interview problems with n ≤ 500, the simpler O(V · E) DFS-per-vertex approach works fine. Hopcroft-Karp matters when n reaches tens of thousands.