Bellman-Ford vs Floyd-Warshall: Same Negative Edges, Different Questions

Cover: Bellman-Ford vs Floyd-Warshall

Dijkstra's algorithm fails the moment a graph has a negative edge. Two algorithms step in: Bellman-Ford and Floyd-Warshall. Both handle negative weights. Both run in polynomial time. The mistake people make is treating them as interchangeable, because they are answering completely different questions.

Bellman-Ford answers: what is the shortest path from one source to every other vertex? Floyd-Warshall answers: for every pair of vertices, what is the shortest path between them? One question or n-squared questions. That single distinction drives every other tradeoff, including when the slower-looking algorithm is actually the right choice.

Why Dijkstra Can't Handle Bad News

Dijkstra works by finalizing vertices greedily. When it pops a vertex u with distance d, it assumes d is already the shortest possible distance to u. That assumption holds only when every edge weight is non-negative, because adding more edges can never make a path shorter.

A single negative edge is enough to break it. Consider: S to A costs 2, S to B costs 5, B to A costs -4. Dijkstra finalizes A at distance 2 immediately. The actual shortest path goes S to B to A at cost 1. One negative edge collapses the entire greedy argument, and no implementation trick repairs it.

Dijkstra failure: one negative edge invalidates the greedy finalization S to A looks like 2. It's actually 1 via B. Dijkstra already filed A under "done" and went home.

Bellman-Ford: Parameterize by Edge Count

Bellman-Ford is a DP where the state is how many edges the path is allowed to use. Define OPT(k, v) as the shortest path from source s to vertex v using at most k edges. The recurrence:

OPT(k, v) = min(
    OPT(k-1, v),                           # skip the k-th edge, keep existing path
    min over edges (u→v): OPT(k-1, u) + w  # extend a (k-1)-edge path by one hop
)

Base case: OPT(0, s) = 0, OPT(0, v) = ∞ for all v ≠ s.

In any graph with V vertices, a simple shortest path visits at most V distinct vertices, so it uses at most V-1 edges. Running V-1 passes is sufficient to compute all true shortest paths.

def bellman_ford(n, edges, source):
    dist = [float('inf')] * n
    dist[source] = 0

    for _ in range(n - 1):
        for u, v, w in edges:
            if dist[u] + w < dist[v]:
                dist[v] = dist[u] + w

    # V-th pass: negative cycle check
    for u, v, w in edges:
        if dist[u] + w < dist[v]:
            return None  # negative cycle reachable from source

    return dist

The V-th pass is the negative cycle detector. A simple path cannot use more than V-1 edges. If any distance still improves on pass V, the improving path must be revisiting a vertex, which means it is going around a negative-weight cycle.

One subtlety: the in-place update shown above (reading already-updated values in the same pass) converges faster in practice, but the V-1 bound still holds. The exact K-edges constraint, which appears in problems like LeetCode 787 (cheapest flights within K stops), requires a snapshot copy of the previous pass to prevent a single path from consuming multiple hops in one iteration.

Time: O(VE). Space: O(V).

Bellman-Ford passes: distance array converging toward the correct answer Pass 1 sets up the distances. Pass 2 lets the negative edge actually fire. The V-th pass checks for negative cycles and politely returns None if it finds one.

Floyd-Warshall: Parameterize by Intermediate Vertices

Floyd-Warshall uses a completely different parameterization. Define dist[i][j][k] as the shortest path from i to j where the path is allowed to route through only the vertices {0, 1, ..., k} as intermediaries (not counting i and j themselves). The recurrence:

dist[i][j][k] = min(
    dist[i][j][k-1],                       # don't route through vertex k
    dist[i][k][k-1] + dist[k][j][k-1]     # split at vertex k
)

Base case: dist[i][j][-1] = weight(i, j) if the edge exists, ∞ otherwise, and 0 when i = j.

After expanding the set of allowed intermediaries to all V vertices, dist[i][j][V-1] holds the true shortest path between every pair.

The k loop must be outermost. This is not a style preference. When computing whether to route through vertex k, the recurrence needs dist[i][k][k-1] and dist[k][j][k-1], meaning all paths using intermediaries {0..k-1} must already be computed. Swapping the loop order to IJK or IKJ produces silently wrong answers on specific inputs. KIJ is the only correct order.

The In-Place 2D Collapse

The 3D table can be reduced to a 2D array. When you process intermediate vertex k, the values dist[i][k] and dist[k][j] are themselves frozen: routing through k cannot improve the path from i to k or from k to j, since that would require k as an intermediate in a path that already has k as an endpoint. So updates to dist[i][j] are safe in place.

def floyd_warshall(n, edges):
    INF = 10**9  # never use float('inf') or INT_MAX directly
    dist = [[INF] * n for _ in range(n)]

    for i in range(n):
        dist[i][i] = 0
    for u, v, w in edges:
        dist[u][v] = w

    for k in range(n):           # intermediate vertex -- outermost, always
        for i in range(n):
            for j in range(n):
                if dist[i][k] + dist[k][j] < dist[i][j]:
                    dist[i][j] = dist[i][k] + dist[k][j]

    return dist

Use 10⁹ as the infinity sentinel, never INT_MAX. Adding two INT_MAX values silently overflows to a negative number in most languages. The comparison then accepts that garbage as a valid improvement, corrupting your distance table with no error or warning. Your algorithm will appear to work. It will return wrong answers on graphs with no negative edges whatsoever. It will do this quietly.

Time: O(V³). Space: O(V²).

Floyd-Warshall: the 3D DP table collapsing into an in-place 2D matrix Conceptually a 3D table, practically a 2D array. The k-th layer is safe to update in place because dist[i][k] and dist[k][j] are frozen during that pass.

A Concrete Example: Same Graph, Both Algorithms

Take four vertices (0, 1, 2, 3) with edges: 0→1 weight 1, 0→2 weight 4, 1→2 weight -3, 1→3 weight 2, 2→3 weight 1.

Bellman-Ford from source 0 converges in two passes. With an edge ordering where 1→2 appears before 0→1 in the list, the negative edge fires first but finds dist[1] = ∞ and cannot improve yet. Pass 1 then sets dist[1] = 1, dist[2] = 4, dist[3] = 3 via vertex 1. End of pass 1: dist = [0, 1, 4, 3]. Pass 2 resolves the chain: 1→2 drops dist[2] to 1+(−3) = −2, then 2→3 drops dist[3] to −2+1 = −1. Final: {0:0, 1:1, 2:−2, 3:−1}.

Floyd-Warshall processes the same graph and fills a 4×4 distance matrix, processing k=0 through k=3. After k=1 (routing through vertex 1), the (0,2) entry drops from 4 to −2, and (0,3) drops from ∞ to 3. The final table encodes every pairwise shortest path.

Bellman-Ford gives you one row of that matrix. Floyd-Warshall builds the whole thing. Both agree on paths from vertex 0. But Floyd-Warshall also tells you the shortest path from vertex 2 to vertex 3, from vertex 1 to vertex 0 (∞ since no return edges exist), and every other pair. Run Bellman-Ford from vertex 0 and you get none of those.

Side-by-side: Bellman-Ford returns a single distance array; Floyd-Warshall returns the full matrix Bellman-Ford: one row. Floyd-Warshall: the whole table. Both agree on row 0. Only one of them bothered with rows 1, 2, and 3.

Complexity: Where the Difference Actually Bites

	Bellman-Ford	Floyd-Warshall
Answers	One source to all	All pairs
Time	O(VE)	O(V³)
Space	O(V)	O(V²)
Negative cycle scope	Reachable from source	Global
K-edge constraint	Yes, with snapshot	No

On a dense graph where E ≈ V², Bellman-Ford is O(V³), the same as Floyd-Warshall. But Floyd-Warshall gives all-pairs for that cost. Running Bellman-Ford independently from every vertex to get all-pairs costs O(V²E) = O(V⁴) on a dense graph, not O(V³). This is a common interview mistake. The interviewer will catch it.

On a sparse graph with negative edges, Bellman-Ford at O(VE) beats Floyd-Warshall at O(V³). For all-pairs on sparse graphs with negative edges, Johnson's algorithm is the right tool: run Bellman-Ford once to compute reweighting potentials, then run Dijkstra from each vertex. Total cost: O(VE + V² log V).

Negative Cycle Detection Is Not Symmetric

Both algorithms detect negative cycles, but they don't catch the same ones.

Bellman-Ford only reports a negative cycle if it is reachable from the chosen source. Suppose vertices A and B form a negative cycle that your source can never reach. Bellman-Ford won't flag it. It will happily return a complete-looking distance array with no indication that something nasty exists elsewhere in the graph.

Floyd-Warshall, after the algorithm completes, lets you check dist[i][i] for every i. Any vertex that can reach itself through a negative cycle will show dist[i][i] < 0. This catches all negative cycles in the graph regardless of which source you care about.

For currency arbitrage detection, where you want to know if any profitable cycle exists anywhere in the graph, Floyd-Warshall's global check is exactly what you need. For shortest-path validity from a specific source, Bellman-Ford's local check is sufficient.

The K-Edges Constraint: Only Bellman-Ford Can Do It

One thing Bellman-Ford can express that Floyd-Warshall structurally cannot: "shortest path using at most K edges."

Floyd-Warshall's recurrence is parameterized over intermediate vertices, not edge count. There is no natural way to add a hop limit to it. The algorithm just doesn't have that dial.

Bellman-Ford's K-pass variant runs exactly K+1 iterations (for K edges allowed) and takes a snapshot of the distance array at the start of each pass. Without the snapshot, in-place updates let a single path consume multiple hops in one iteration, producing paths shorter than K edges. The snapshot enforces the "at most K edges" constraint exactly.

def bellman_ford_k_edges(n, edges, source, k):
    dist = [float('inf')] * n
    dist[source] = 0

    for _ in range(k):
        prev = dist[:]          # snapshot before this pass
        for u, v, w in edges:
            if prev[u] + w < dist[v]:
                dist[v] = prev[u] + w

    return dist

LeetCode 787 (cheapest flights within K stops) is the canonical example. K stops means K+1 edges, so run K+1 passes with the snapshot. Missing the snapshot is the most common bug on that problem. It produces answers that look plausible and are wrong.

When to Use Which One

Decision flowchart: Bellman-Ford vs Floyd-Warshall vs Johnson's The flowchart your interviewer wants you to have internalized. Dense all-pairs goes to Floyd-Warshall. Sparse all-pairs goes to Johnson's. Everything single-source goes to Bellman-Ford. And if someone tries to run Bellman-Ford from every vertex to get all-pairs on a dense graph, that's O(V⁴) and you should say something.

Use Bellman-Ford when you have one source, the graph has negative edges, you need to detect negative cycles reachable from that source, or the problem has a K-edge (K-stop) constraint.

Use Floyd-Warshall when you need shortest paths between all pairs, the graph is small (V up to a few hundred), you want global negative cycle detection, or you need the transitive closure variant (replace the min/+ operations with or/and to check reachability instead of distance).

Use Johnson's algorithm when you need all-pairs on a sparse graph with negative edges and V is large enough that O(V³) is too slow.

The Short Version

Bellman-Ford is edge-count DP: OPT(k,v) = shortest path using at most k edges. O(VE) time, O(V) space.
Floyd-Warshall is intermediate-vertex DP: dist[i][j][k] = shortest path through {0..k}. O(V³) time, O(V²) space.
The parameterization difference is not cosmetic. It determines what constraints each algorithm can express.
Running Bellman-Ford from every vertex on a dense graph costs O(V⁴), not O(V³). Use Floyd-Warshall instead.
Floyd-Warshall's k loop must be outermost or the results are silently wrong.
Bellman-Ford detects negative cycles reachable from source. Floyd-Warshall detects all of them globally.
Never use INT_MAX as infinity in Floyd-Warshall. Use 10⁹ or INT_MAX/2.

If you want to work through graph algorithm tradeoffs in a real interview setting, with an AI interviewer probing exactly these decisions in real time, SpaceComplexity runs voice-based DSA mock interviews with rubric-based feedback on algorithm selection, complexity analysis, and communication.

For a deeper look at each algorithm individually, see the Bellman-Ford and Floyd-Warshall reference guides. For the algorithm that breaks when negative edges appear, see Dijkstra's algorithm.