Floyd-Warshall Algorithm: Finding Every Shortest Path With One DP Table

Run Dijkstra once and you get the shortest distance from one source to every vertex. Useful. But sometimes the problem needs all of them: the shortest path from every vertex to every other vertex.

You could run Dijkstra V times. Nobody is stopping you. Floyd-Warshall solves the same problem in about ten lines, handles negative edge weights without extra setup, and detects negative cycles on the way out. It also generalizes cleanly to problems that have nothing to do with distances.

The mental model: grow the set of "allowed intermediate vertices" one vertex at a time. For each pair (i, j), ask whether routing through the newest intermediate vertex makes the path shorter. Repeat for every vertex. By the time you're done, you have the globally optimal distance for every pair.

Robert Floyd published it in 1962 in Communications of the ACM ("Algorithm 97: Shortest Path"). Stephen Warshall published the boolean transitive-closure version the same year in Journal of the ACM ("A Theorem on Boolean Matrices"). Bernard Roy had the same idea in 1959. Peter Ingerman wrote down the modern three-nested-loops formulation, also in 1962. Three of the four published in 1962 alone. Roy had beaten them to it by three years, got none of the credit, and the algorithm still carries two other people's names. Computer science history is a rough place.

The State Definition That Makes Everything Obvious

Every DP algorithm hinges on a state definition. Get it right and the rest writes itself. Get it wrong and you spend an afternoon debugging a three-nested-loop mess.

Floyd-Warshall's state is:

dist[i][j][k] = the length of the shortest path from vertex i to vertex j, where intermediate vertices are drawn only from {0, 1, ..., k}.

The set of allowed intermediaries grows one vertex at a time. At step k, you're asking: "what's the shortest path from i to j that's allowed to pass through any of the first k+1 vertices?"

The recurrence follows directly. Any shortest path from i to j using vertices {0,...,k} as intermediaries either passes through k or it doesn't.

If it avoids k, the answer is already in dist[i][j][k-1].

If it goes through k, the path looks like i → ... → k → ... → j. By optimal substructure, both sub-paths must themselves be optimal paths within {0,...,k-1}. So the i-to-k segment is dist[i][k][k-1] and the k-to-j segment is dist[k][j][k-1].

The recurrence:

dist[i][j][k] = min(dist[i][j][k-1], dist[i][k][k-1] + dist[k][j][k-1])

Base case (k = -1, no intermediaries allowed): dist[i][j][-1] equals the direct edge weight, 0 if i == j, and infinity if no edge exists.

Two cases: path through k splits into (i to k) + (k to j) using only {0..k-1}; path avoiding k goes direct using {0..k-1} The recurrence in picture form. Either k is on the path (split into two sub-paths) or it isn't (carry forward the existing distance).

The loop order follows directly. k must be the outer loop. When processing intermediate vertex k, you need dist[i][k][k-1] and dist[k][j][k-1] to already be fully computed. Those are the rows and columns of the previous DP layer, which are guaranteed to be ready because k is outermost. Swap the loop order and the algorithm silently produces wrong answers. It will still run. The numbers will look plausible. They'll just be wrong.

Why You Only Need O(V²) Space

The recurrence formally references dist[...][k-1], hinting at a 3D array with O(V³) space. You can do it in O(V²) instead.

When processing outer loop index k, the values dist[i][k] and dist[k][j] never change during that iteration. The update formula for dist[i][k] would be:

dist[i][k] = min(dist[i][k], dist[i][k] + dist[k][k])

Since dist[k][k] = 0 (on any graph without negative cycles), this reduces to min(dist[i][k], dist[i][k]), which changes nothing. The same logic applies to every cell in column k.

Row k and column k of the matrix are frozen throughout iteration k. So the 2D in-place update produces exactly the same result as the 3D formulation. Space drops from O(V³) to O(V²).

A Four-Vertex Example, Step by Step

Directed weighted graph with four vertices. Direct edge 0 to 3 costs 7 (dashed). The three-hop path 0 to 1 to 2 to 3 costs 3+2+1=6, beating the direct edge. The direct edge 0→3 costs 7. The three-hop path costs 6. Your intuition about "more hops means more cost" is wrong. Floyd-Warshall knows this.

Initial matrix (INF = no direct edge):

      0    1    2    3
 0  [ 0,   3,  INF,  7 ]
 1  [ 8,   0,   2,  INF]
 2  [ 5,  INF,  0,   1 ]
 3  [ 2,  INF, INF,  0 ]

After k=0 (vertex 0 as intermediary): paths routed through vertex 0 populate previously disconnected pairs. dist[1][3] goes from INF to 15 (via 1→0→3). dist[3][1] goes from INF to 5 (via 3→0→1). dist[2][1] goes from INF to 8 (via 2→0→1).

After k=1 (vertex 1 as intermediary): dist[0][2] drops from INF to 5 (via 0→1→2). dist[3][2] becomes 7 (via 3→0→1→2).

After k=2 (vertex 2 as intermediary): the notable updates arrive. dist[0][3] drops from 7 to 6 (via 0→1→2→3). The direct edge loses to the three-hop detour. dist[1][3] drops from 15 to 3 (via 1→2→3). dist[1][0] drops from 8 to 7 (via 1→2→0).

After k=3 (vertex 3 as intermediary): dist[2][0] drops from 5 to 3 (via 2→3→0). dist[2][1] drops from 8 to 6. dist[1][0] finalizes at 5 (via 1→2→3→0).

Final result:

      0    1    2    3
 0  [ 0,   3,   5,   6 ]
 1  [ 5,   0,   2,   3 ]
 2  [ 3,   6,   0,   1 ]
 3  [ 2,   5,   7,   0 ]

Initial distance matrix versus final all-pairs shortest distance matrix. Green cells show distances that improved. The 0-to-3 entry drops from 7 to 6. Every green cell is a case where the algorithm found a shorter path than the direct edge. The diagonal stays zero throughout.

The shortest path from 0 to 3 is 6, not 7. It goes 0→1→2→3.

Core Operations

Operation	Time	Space	Notes
Initialize distance matrix	O(V²)	O(V²)	One pass over all pairs
Main DP loop	O(V³)	O(1) extra	Three nested loops, O(1) per triple
Negative cycle detection	O(V)	O(1)	Check diagonal after main loop
Build path reconstruction matrix	O(V³)	O(V²)	Maintain `nxt[i][j]` alongside `dist[i][j]`
Retrieve one shortest path	O(V)	O(V)	Follow `nxt` pointers from source

The O(V³) bound is tight. There are V choices for k, V choices for i, V choices for j, and O(1) work per triple: one addition, one comparison. No shortcuts exist; you must consider all V³ combinations.

The constant factor is small. V = 500 gives 125 million iterations, which modern hardware handles comfortably in under a second. V = 1000 gives 1 billion iterations, which takes several seconds. That's a long time to stare at a loading spinner, so the practical cutoff for competitive programming is around V = 400-500.

Negative cycle detection is free. After the main loop, scan the diagonal. If dist[i][i] < 0 for any i, there's a negative cycle involving vertex i. This costs O(V), invisible next to the O(V³) main loop.

The negative cycle caveat is worth flagging. If a negative cycle exists, the distances for vertices reachable from that cycle are technically negative infinity. Floyd-Warshall does not propagate this correctly in all cases. For graphs where you only care about pairs that are provably not affected by a negative cycle, the algorithm gives correct answers for those pairs. But in practice: detect the cycle, signal an error, and let the caller decide. Don't silently return a distance of negative twelve billion and call it a day.

Path reconstruction adds O(V²) space and zero extra time complexity. Alongside dist[i][j], maintain nxt[i][j] (the first hop on the shortest path from i to j). When you update dist[i][j] via vertex k, set nxt[i][j] = nxt[i][k]. Reconstruction then follows nxt pointers in O(V) per query.

For dense graphs, Floyd-Warshall is faster than running Dijkstra V times (O(V³) vs O(V³ log V)). For sparse graphs with positive weights, V calls to Dijkstra win: O(VE log V) with a binary heap. For sparse graphs with negative weights, Johnson's algorithm (Bellman-Ford reweighting plus V Dijkstra passes) runs in O(VE log V + VE) and beats Floyd-Warshall when E is much smaller than V².

One Structure, Every Language

Three nested loops. One min. Fourteen languages. The algorithm looks almost too simple for what it actually does. That's the point.

import math
from typing import Optional

INF = math.inf


def floyd_warshall(graph: list[list[float]]) -> list[list[float]]:
    n = len(graph)
    dist = [row[:] for row in graph]

    for k in range(n):
        for i in range(n):
            for j in range(n):
                through_k = dist[i][k] + dist[k][j]
                if through_k < dist[i][j]:
                    dist[i][j] = through_k

    return dist


def has_negative_cycle(dist: list[list[float]]) -> bool:
    return any(dist[i][i] < 0 for i in range(len(dist)))


# With path reconstruction
def floyd_warshall_with_paths(
    graph: list[list[float]],
) -> tuple[list[list[float]], list[list[Optional[int]]]]:
    n = len(graph)
    dist = [row[:] for row in graph]
    nxt: list[list[Optional[int]]] = [
        [j if graph[i][j] < INF and i != j else None for j in range(n)]
        for i in range(n)
    ]

    for k in range(n):
        for i in range(n):
            for j in range(n):
                if dist[i][k] + dist[k][j] < dist[i][j]:
                    dist[i][j] = dist[i][k] + dist[k][j]
                    nxt[i][j] = nxt[i][k]

    return dist, nxt


def get_path(
    nxt: list[list[Optional[int]]], src: int, dst: int
) -> Optional[list[int]]:
    if nxt[src][dst] is None:
        return None
    path = [src]
    while path[-1] != dst:
        path.append(nxt[path[-1]][dst])  # type: ignore[arg-type]
    return path


# Example
graph = [
    [0,   3,   INF, 7  ],
    [8,   0,   2,   INF],
    [5,   INF, 0,   1  ],
    [2,   INF, INF, 0  ],
]
dist, nxt = floyd_warshall_with_paths(graph)
print(dist[0][3])           # 6.0
print(get_path(nxt, 0, 3))  # [0, 1, 2, 3]

The INF Overflow Bug Nobody Warns You About

This is the most common implementation mistake. If you represent "no edge" with Integer.MAX_VALUE in Java (or INT_MAX in C/C++), then dist[i][k] + dist[k][j] overflows to a large negative number when both values are INF. Your shortest-path table silently fills with garbage.

No warning. No exception. Just confident, incorrect output.

Use INF = Integer.MAX_VALUE / 2 in Java and C#, or INF = 1e9 in C and C++. Either value is large enough that no real path cost comes close, and adding two of them doesn't overflow. The guard if (dist[i][k] < INF && dist[k][j] < INF) is an equally valid fix and arguably more readable. Use one or the other, not neither.

Languages with native float infinity (Python, JavaScript, TypeScript, Ruby) sidestep this entirely. Infinity + Infinity == Infinity in IEEE 754. The JVM crowd is not so lucky.

What Problems It Solves

All-Pairs Shortest Paths

The obvious one. A precomputed distance table is useful anywhere you need to answer many arbitrary source-destination queries: navigation systems, offline routing table computation, flight search that has to evaluate indirect connections.

Transitive Closure

Warshall's original formulation. Replace min with || and + with &&:

def transitive_closure(adj: list[list[bool]]) -> list[list[bool]]:
    n = len(adj)
    reach = [row[:] for row in adj]
    for i in range(n):
        reach[i][i] = True  # every vertex reaches itself

    for k in range(n):
        for i in range(n):
            for j in range(n):
                reach[i][j] = reach[i][j] or (reach[i][k] and reach[k][j])

    return reach

After this runs, reach[i][j] is True if and only if there's any path from i to j. Same loop structure, different algebra. For small graphs where you'll answer many reachability queries, this is the move.

Detecting Arbitrage

Given a matrix of currency exchange rates, can you make money by cycling through currencies? Transform: dist[i][j] = -log(rate[i][j]). Now a cycle with a profit (product of rates above 1) has log-sum below 0. Run Floyd-Warshall and check whether any dist[i][i] < 0.

In practice, real exchanges close these windows in milliseconds. But knowing the algorithm is still fun. And occasionally a job interview question.

Three currencies USD, EUR, GBP with exchange rates on directed edges. The GBP-to-USD rate closes a profitable cycle, shown in red. The sum of negative-log weights is negative. Transform exchange rates into edge weights via -ln(rate). A profitable cycle (product of rates above 1) becomes a negative-weight cycle. Floyd-Warshall finds it for you.

Graph Diameter and Eccentricity

After running Floyd-Warshall, max over all (i,j) of dist[i][j] is the graph diameter. The eccentricity of vertex i is max over j of dist[i][j]. The radius is min over i of eccentricity(i). All of these fall out of the complete distance table for free.

Bottleneck (Minimax) Paths

Replace (min, +) with (min, max): dist[i][j] = min(dist[i][j], max(dist[i][k], dist[k][j])). Now the distance between two vertices is the minimum possible maximum edge weight on any path between them. Useful in bandwidth-constrained networks and mountain pass routing.

When to Use the Floyd-Warshall Algorithm

Four signals that point toward Floyd-Warshall:

"Find the shortest path between all pairs" appears literally in the problem, or you need to answer many arbitrary (source, destination) distance queries.
The graph is small. V ≤ 100 in most interview problems, V ≤ 400 in competitive programming. If V = 10,000, you need a different tool.
Edge weights can be negative, but the problem guarantees no negative cycles (or asks you to detect them).
The problem smells like reachability for all pairs, not just single-source, and you need results offline.

Worked Example 1: LeetCode 1334

Problem: There are n cities numbered 0 to n-1. You're given a list of bidirectional weighted roads and a distanceThreshold. For each city, count how many other cities are reachable within distanceThreshold distance. Return the city with the smallest count (ties go to the largest index).

Why Floyd-Warshall fits: The problem explicitly asks for distances between all pairs. Constraints: n ≤ 100. That's a dead giveaway. Dijkstra from each node would also work but Floyd-Warshall is three lines of nested loops and requires no priority queue setup.

def findTheCity(n: int, edges: list[list[int]], distanceThreshold: int) -> int:
    import math
    INF = math.inf

    dist = [[INF] * n for _ in range(n)]
    for i in range(n):
        dist[i][i] = 0
    for u, v, w in edges:
        dist[u][v] = w
        dist[v][u] = w

    for k in range(n):
        for i in range(n):
            for j in range(n):
                if dist[i][k] + dist[k][j] < dist[i][j]:
                    dist[i][j] = dist[i][k] + dist[k][j]

    best_city, best_count = -1, n + 1
    for i in range(n):
        count = sum(1 for j in range(n) if i != j and dist[i][j] <= distanceThreshold)
        if count <= best_count:
            best_count = count
            best_city = i

    return best_city

The loop if count <= best_count (not <) handles the tie-breaking rule: among equally connected cities, return the one with the largest index. Iterating i from 0 to n-1 and using <= ensures the last equally minimal city wins.

Worked Example 2: LeetCode 1462

Problem: You have numCourses courses. Some courses are prerequisites of others (edges in a DAG). For each query [u, v], answer whether course u is a direct or indirect prerequisite of course v.

Why Floyd-Warshall fits: This is transitive closure. You need to answer many reachability queries on a small graph (numCourses ≤ 100). Build the reachability matrix once, answer each query in O(1).

def checkIfPrerequisite(
    numCourses: int,
    prerequisites: list[list[int]],
    queries: list[list[int]],
) -> list[bool]:
    reach = [[False] * numCourses for _ in range(numCourses)]

    for u, v in prerequisites:
        reach[u][v] = True
    for i in range(numCourses):
        reach[i][i] = True

    for k in range(numCourses):
        for i in range(numCourses):
            for j in range(numCourses):
                reach[i][j] = reach[i][j] or (reach[i][k] and reach[k][j])

    return [reach[u][v] for u, v in queries]

The structure is identical to the shortest-path version. Swap min with or and + with and. The same inductive argument applies: after processing intermediary k, reach[i][j] is True if there's a path from i to j using only vertices from {0,...,k}.

Small DAG of five courses with prerequisite edges on the left. On the right, a reachability matrix showing how setting k equals 2 discovers that courses 0 and 1 can reach courses 3 and 4 through course 2. k=2 is the key step. Once course 2 becomes an allowed intermediary, the matrix propagates reachability two hops forward. Same three loops, different algebra.

The Generalizations Worth Knowing

Floyd-Warshall is a template, not a single algorithm. The pattern is: three nested loops over k, i, j, and a "relaxation" step that combines two partial results. Swap the algebra and you get a different algorithm entirely. Same skeleton. Different universe of problems.

Algebra	Combines via	Initializes via	Solves
(min, +)	min	edge weights / INF	Shortest paths
(or, and)	or	adjacency booleans	Transitive closure
(min, max)	min	edge weights / INF	Bottleneck paths
(max, min)	max	edge weights / 0	Widest paths
(max, *)	max	exchange rates / 0	Longest product path

Every row in this table uses the same three-nested-loops skeleton.

Recap

State: dist[i][j] = shortest path from i to j using vertices {0,...,k} as allowed intermediaries, where k grows from 0 to V-1 in the outer loop.
Recurrence: dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j]). The path either avoids k or splits at k.
Loop order is non-negotiable: k outermost. Violating this breaks the invariant.
In-place 2D update is safe because row k and column k are frozen during iteration k (dist[k][k] = 0 makes the self-update a no-op).
Time O(V³), space O(V²). The 3D formulation is equivalent but wastes memory.
Negative cycle detection is free: check dist[i][i] < 0 after the main loop.
INF overflow kills integer implementations. Use MAX_VALUE / 2 or 1e9, not MAX_VALUE.
The template generalizes. Swap (min, +) for (or, and) and you have transitive closure. Swap for (min, max) and you have bottleneck paths.
Use it when V is small, you need all-pairs distances or reachability, and negative edges might be present.
Don't use it when V is large or the graph is sparse with positive weights. Prefer V calls to Dijkstra, or Johnson's algorithm if negative edges are present.

Understanding the DP state is one thing. Seeing that a problem wants all-pairs distances under time pressure is another. SpaceComplexity runs live voice-based mock interviews with rubric feedback, so you can practice explaining the loop-order reasoning out loud, not just to yourself.

For the single-source view, Dijkstra's algorithm is the natural companion: once you understand both, reasoning about when the cubic overhead is worth it becomes straightforward. If the graph is unweighted, BFS vs DFS covers when a single traversal is enough. For foundational graph representation, graph data structure and adjacency matrix is where to start.