Bellman-Ford Algorithm: Shortest Paths With Negative Edges

Dijkstra is fast, clean, and elegant. It also silently produces wrong answers the moment you hand it a negative edge weight. No error. No warning. Just quietly incorrect output, which is the worst kind of incorrect.

The Bellman-Ford algorithm drops that constraint. It handles negative weights, detects negative cycles, and tells you when a graph has no finite shortest path at all. The price: it's slower. O(VE) vs Dijkstra's O((V+E) log V). That's the deal.

The mental model: maintain a distance array, initialize the source to zero and everything else to infinity, then relax every edge V-1 times. After k passes, every shortest path using at most k edges is correct. Since no simple path visits more than V-1 edges, V-1 passes cover everything. Run one more pass. If anything still improves, you have a negative cycle.

Bellman-Ford Is Dynamic Programming

Richard Bellman invented both this algorithm and the DP framework. He also named the framework after himself. Some people name buildings after themselves. Bellman named a paradigm. Efficient.

The connection to DP is structural, not a loose analogy.

Define OPT(k, v) as the minimum-weight path from source s to vertex v using at most k edges:

OPT(0, s)  = 0
OPT(0, v)  = ∞       for all v ≠ s
OPT(k, v)  = min(
               OPT(k−1, v),
               min over all edges (u→v): OPT(k−1, u) + w(u, v)
             )

The algorithm fills this table row by row, and row k depends only on row k−1. The standard implementation uses a single distance array updated in-place, which is equivalent to a rolling DP table. In-place updates let newly relaxed values propagate to downstream vertices within the same pass, so the algorithm often converges in fewer than V-1 iterations. The V-1 bound still holds as a worst-case upper bound.

The strict two-array version (copy the array before each pass) is required when the problem constrains the path to exactly K edges, covered below.

DP table with rows indexed k=0 through k=4 (iterations) and columns A through E (vertices), showing distances filling in row by row with each cell derived from the minimum of the cell above and relaxed edge paths Row k depends only on row k-1. Row 1 is where all the action happens in our example.

Watching It Run

Take this directed graph with five vertices and one negative edge:

A → B  (weight 4)
A → C  (weight 2)
C → B  (weight −3)
B → D  (weight 3)
C → D  (weight 5)
D → E  (weight 1)

Directed graph with five vertices A through E and six labeled edges. The negative edge C to B (weight minus 3) is highlighted in amber with a dashed style. Final distances from A are shown: A=0, B=-1, C=2, D=2, E=3 The C to B edge (weight -3) is the troublemaker. Dijkstra would already have "settled" B at 4 before seeing this edge.

Initial state: A=0, B=∞, C=∞, D=∞, E=∞

Pass 1 (relax all edges in edge-list order):

A→B: B = min(∞, 0+4) = 4
A→C: C = min(∞, 0+2) = 2
C→B: B = min(4, 2−3) = −1 (negative edge improves B mid-pass)
B→D: D = min(∞, −1+3) = 2 (uses B's already-updated value)
C→D: D = min(2, 2+5) = 2 (no change)
D→E: E = min(∞, 2+1) = 3

After pass 1: A=0, B=−1, C=2, D=2, E=3

Pass 2: all edges check with no improvement. Early termination fires.

Final distances: A=0, B=−1, C=2, D=2, E=3.

Distance array evolution showing three states: Initial (all infinity except A=0), After Pass 1 (B improves from infinity to -1 via the negative edge, C=2, D=2, E=3), After Pass 2 (no changes, algorithm converged) The in-place update is why B's improved value immediately benefits D and E within the same pass. Two-array version would need a second pass for that.

The in-place update is why B's improved value immediately benefits D and E in the same pass. In the strict two-array version this would require a second pass.

Why V-1 Passes Are Enough. The Proof.

Claim: after k iterations, d[v] equals the weight of the shortest path from s to v using at most k edges.

Proof by induction on k.

Base case (k=0): d[s]=0 and d[v]=∞ for all v≠s. Correct, because the only 0-edge path from s leads back to s.

Inductive step: Assume after k−1 iterations, d[u] equals the length of the shortest path from s to u using at most k−1 edges. Consider the shortest path P from s to v using exactly k edges: P = (s, ..., u, v), where (u, v) is the last edge. The sub-path from s to u uses k−1 edges. By the inductive hypothesis, d[u] is correct after k−1 iterations. In iteration k, the algorithm examines edge (u, v) and sets d[v] = min(d[v], d[u] + w(u,v)). That value equals the length of P.

Inductive proof visualization: a path from s through intermediate nodes to u to v. The sub-path of k-1 edges is boxed, labeled as correct by inductive hypothesis. The final edge u to v is relaxed in iteration k. A callout shows: simple paths have at most V-1 edges, so V-1 iterations suffice. The inductive step in one picture. The V-1 upper bound follows because simple paths can't repeat vertices.

If no negative cycles are reachable from the source, every shortest path is a simple path, so V-1 iterations are sufficient. A simple path visits each vertex at most once, so any simple path in a V-vertex graph has at most V-1 edges.

One Extra Pass Finds the Negative Cycle

After V-1 iterations, run a V-th pass over all edges. If any edge (u, v) can still be relaxed, a negative cycle is reachable from the source.

Why? If no negative cycle exists, V-1 iterations are enough and nothing changes in the V-th pass. A V-th improvement means some distance is still decreasing. That only happens if there's a cycle you can traverse repeatedly to lower cost indefinitely.

To extract the actual cycle:

After the V-th pass, take any vertex x that was still relaxed.
Walk back V steps through predecessor pointers. This guarantees you've entered the cycle rather than approaching it along a path leading to it.
From that point, follow predecessors until you return to the same vertex.

def find_negative_cycle(edges, num_vertices, source):
    distance = [float('inf')] * num_vertices
    predecessor = [-1] * num_vertices
    distance[source] = 0

    last_relaxed = -1
    for _ in range(num_vertices):  # V total iterations
        last_relaxed = -1
        for u, v, weight in edges:
            if distance[u] != float('inf') and distance[u] + weight < distance[v]:
                distance[v] = distance[u] + weight
                predecessor[v] = u
                last_relaxed = v

    if last_relaxed == -1:
        return None  # No negative cycle reachable

    # Walk back V steps to guarantee we enter the cycle
    x = last_relaxed
    for _ in range(num_vertices):
        x = predecessor[x]

    # Trace cycle from x until we return to x
    cycle = []
    current = x
    while True:
        cycle.append(current)
        current = predecessor[current]
        if current == x:
            break
    cycle.append(x)
    return cycle[::-1]

Four-vertex graph with a negative cycle. Source connects to A, then A to B (weight 5). The cycle B to C (weight 4), C to D (weight -3), D to B (weight -6) has total weight -5. After V-1 passes, the V-th pass still relaxes D. Predecessor trace from D finds the cycle B to C to D and back to B. Cycle B→C→D→B totals to -5. The algorithm keeps wanting to improve D forever. That's the tell.

When You Need Exactly K Edges

The DP framing reveals a powerful variant. If a problem asks for the shortest path using at most K edges, run K+1 iterations. But here you must use the strict two-array approach: snapshot the distance array before each pass, and write all updates into the snapshot.

Without the snapshot: a single pass over A→B, B→C, C→D can update D using B's value which was just updated in the same pass, effectively counting 3 hops in one "iteration." That breaks the K-edge constraint.

The temp = distance[:] copy is the entire difference between standard Bellman-Ford and the K-hops variant. This is the solution structure for LeetCode 787 (Cheapest Flights Within K Stops).

def cheapest_with_k_edges(num_vertices, edges, source, destination, k):
    distance = [float('inf')] * num_vertices
    distance[source] = 0

    for _ in range(k + 1):
        temp = distance[:]  # Snapshot before this pass
        for u, v, weight in edges:
            if distance[u] != float('inf') and distance[u] + weight < temp[v]:
                temp[v] = distance[u] + weight
        distance = temp

    return -1 if distance[destination] == float('inf') else distance[destination]

How Much Does It Cost?

Operation	Time	Space	Notes
Initialize	O(V)	O(V)	Set `d[source]=0`, rest `∞`
Relax all edges (one pass)	O(E)	O(1)	Visit every edge once
Full run (V-1 passes)	O(VE)	O(V)	Outer loop V-1, inner loop E
Early termination (best case)	O(E)	O(V)	First pass makes no changes
Negative cycle check	O(E)	O(1)	One additional pass
Negative cycle extraction	O(V)	O(V)	V hops + cycle trace
SPFA (queue-based variant)	O(E) avg, O(VE) worst	O(V)	No adversarial guarantee

The O(VE) bound holds because the outer loop runs V-1 times and each pass visits all E edges: (V-1) × E = O(VE).

Space is O(V) for the distance array and the optional predecessor array. The edge list is O(E) input that does not count against auxiliary space.

The early termination optimization: if a complete pass over all E edges produces zero updates, the algorithm has converged and can stop immediately. On graphs where shortest paths are shallow (few hops from source to every vertex), the algorithm often terminates after just a few passes.

SPFA (Shortest Path Faster Algorithm) is a queue-based variant. Instead of visiting all E edges on every pass, maintain a queue of vertices whose distances just improved and process only their outgoing edges. Average-case behavior is O(E), but adversarial graphs (grids, specific dense structures) still hit O(VE). SPFA is popular in competitive programming and that's fine. In a production system where you don't control the inputs, assume someone is trying to hit your worst case.

Dijkstra or Bellman-Ford? Here's How to Choose.

	Dijkstra (binary heap)	Bellman-Ford
Negative edge weights	No	Yes
Negative cycle detection	No	Yes
Time complexity	O((V+E) log V)	O(VE)
Dense graph	O(V² log V)	O(V³)
DAG with negative edges	Topo sort + 1 pass: O(V+E)	V-1 passes (overkill)
At most K edges constraint	No clean fit	K+1 iterations with snapshot
Practical speed (no negatives)	Much faster	Slower

One underused shortcut: if your graph is a DAG (no cycles) and has negative edges, skip Bellman-Ford entirely. Topological sort the DAG in O(V+E), then relax edges in topological order exactly once in O(E). Total: O(V+E), with full negative-edge support and no wasted passes.

One Structure, Every Language

The core implementation is an edge list plus a distance array. No priority queue, no adjacency list. Just iterate the edges.

from typing import Optional

def bellman_ford(
    edges: list[tuple[int, int, int]],
    num_vertices: int,
    source: int,
) -> Optional[list[float]]:
    distance = [float('inf')] * num_vertices
    distance[source] = 0

    for _ in range(num_vertices - 1):
        updated = False
        for u, v, weight in edges:
            if distance[u] != float('inf') and distance[u] + weight < distance[v]:
                distance[v] = distance[u] + weight
                updated = True
        if not updated:
            break

    for u, v, weight in edges:
        if distance[u] != float('inf') and distance[u] + weight < distance[v]:
            return None  # Negative cycle detected

    return distance

When Should You Actually Use This?

Shortest paths with negative edge weights

The direct use case. If any edge weight is negative and you need shortest paths from one source to all vertices, Bellman-Ford is the standard answer. Dijkstra's settled-node invariant breaks the moment a negative edge can retroactively improve a "done" vertex. There's no patch. You need a different algorithm.

Negative cycle detection

Any problem that asks whether traversing a cycle can reduce cost indefinitely. Financial arbitrage is the canonical application. Constraint satisfaction problems modeled as graphs also use cycle detection to find infeasible regions.

Shortest paths with a hop limit

"Find the minimum-cost path using at most K edges" maps cleanly to running K+1 Bellman-Ford iterations with the snapshot approach. LeetCode 787 (Cheapest Flights Within K Stops) is the standard interview instance of this class.

Distance-vector routing

RIP (Routing Information Protocol) is a direct distributed implementation of Bellman-Ford. Each router maintains a distance vector (best known cost to each destination) and exchanges it with neighbors. The local update rule is exactly one Bellman-Ford relaxation step.

It powered early ARPANET routing in the 1970s and still runs in small networks. This is somewhat unsettling once you understand what happens when a link fails.

Meme: man saying 'plugs router into router, refuses to elaborate further, leaves' Distributed Bellman-Ford, when a link goes down. The routers exchange increasingly wrong information with the confidence of people who have never been more certain of anything.

One caveat: distributed Bellman-Ford can suffer from "count-to-infinity" after a link failure. Two routers, each convinced the other has a valid route, increment their cost estimates by 1 every time they talk. It's the network equivalent of two people arguing who should pay the bill by each saying "I'll get it next time," forever. The cost converges to infinity. Modern protocols fix this with route poisoning, split horizon, or by switching to link-state algorithms (OSPF, IS-IS) that flood the full topology instead.

Four Signals It's Bellman-Ford

Signal 1: Negative edge weights are explicitly present. Dijkstra will silently produce wrong answers. Any graph where costs can go negative needs Bellman-Ford.

Signal 2: The problem asks whether a negative cycle exists, or whether a loop produces unbounded gain. The V-th pass check gives this for free.

Signal 3: "At most K hops / stops / transfers / edges" appears. This constraint maps directly to K+1 iterations with the snapshot copy. Dijkstra's algorithm has no clean way to enforce a hop count.

Signal 4: The graph is small enough. Bellman-Ford's O(VE) is substantially slower than Dijkstra for equivalent graphs. If V=10,000 and E=100,000, Dijkstra takes roughly 1.7M operations and Bellman-Ford takes 10^9. Check the constraint block before choosing.

Worked Example: Cheapest Flights Within K Stops (LeetCode 787)

You have n cities connected by directed flights with prices. Find the cheapest price from src to dst using at most k stops (k+1 edges).

Why Bellman-Ford fits: the hop bound maps directly to K+1 iterations. Dijkstra has no natural way to track remaining allowed edges. BFS ignores prices. Dynamic programming on the flight graph is exactly Bellman-Ford with the snapshot copy.

The snapshot is the whole implementation insight. Without it, a chain of cities can all update in one iteration, violating the edge count constraint.

def find_cheapest_price(n: int, flights: list, src: int, dst: int, k: int) -> int:
    distance = [float('inf')] * n
    distance[src] = 0

    for _ in range(k + 1):
        temp = distance[:]  # Snapshot: prevent same-pass cascades
        for from_city, to_city, price in flights:
            if distance[from_city] != float('inf') and distance[from_city] + price < temp[to_city]:
                temp[to_city] = distance[from_city] + price
        distance = temp

    return -1 if distance[dst] == float('inf') else distance[dst]

Worked Example: Currency Arbitrage

Given an n×n matrix of exchange rates between currencies, determine whether a profitable arbitrage cycle exists.

Transform: for each exchange from currency i to currency j at rate r(i,j), create a directed edge with weight −ln(r(i,j)). A sequence of trades that multiplies your money by more than 1 corresponds to a path where the product of rates exceeds 1, which means the sum of −ln(rate) values is negative. A negative cycle in this graph means arbitrage.

Concretely: $100 USD at rate 0.92 gives €92. €92 at rate 160.5 gives ¥14,766. ¥14,766 at rate 0.0078 gives $115.18. The cycle of weights is −ln(0.92) − ln(160.5) − ln(0.0078) ≈ 0.083 − 5.08 + 4.85 = −0.15 < 0. Bellman-Ford detects it.

Initialize all distances to 0 (equivalent to a virtual source connected to every currency vertex) so cycles reachable from any starting currency are detected.

Currency arbitrage diagram: USD to EUR to JPY and back to USD. Each directed edge shows the exchange rate and its negative log value. The cycle sums to -0.15, which is negative, confirming an arbitrage opportunity. The profit of 15.2% is annotated. $100 in, $115.18 out. The SEC does not find this charming.

import math

def has_arbitrage(num_currencies: int, rates: list[list[float]]) -> bool:
    edges = []
    for i in range(num_currencies):
        for j in range(num_currencies):
            if i != j and rates[i][j] > 0:
                edges.append((i, j, -math.log(rates[i][j])))

    # All distances start at 0: virtual source reachable from any currency
    distance = [0.0] * num_currencies

    for _ in range(num_currencies - 1):
        for u, v, weight in edges:
            if distance[u] + weight < distance[v]:
                distance[v] = distance[u] + weight

    for u, v, weight in edges:
        if distance[u] + weight < distance[v]:
            return True  # Negative cycle = arbitrage opportunity

    return False

Quick Recap

Bellman-Ford is dynamic programming on a graph: OPT(k, v) = min over all edges into v of OPT(k-1, u) + w(u,v).
After k passes, every shortest path using at most k edges is correct (proved by induction on path length).
V-1 passes are sufficient because no simple path has more than V-1 edges.
A V-th pass that still relaxes any edge proves a negative cycle is reachable from the source.
Cycle extraction: from the last-relaxed vertex, follow predecessors V steps to enter the cycle, then trace back to the same vertex.
The K-edges variant runs K+1 iterations with a snapshot copy per pass; without the copy, multi-hop chains collapse into a single iteration.
Use Dijkstra when weights are non-negative. Use Bellman-Ford when they are not, or when you need hop limits or cycle detection.
For DAGs with negative edges, topological sort plus one relaxation pass gives O(V+E) with no cycles to worry about.
SPFA (queue-based Bellman-Ford) runs in O(E) on average but hits O(VE) under adversarial inputs.

Coding Bellman-Ford at home is one thing. Explaining your reasoning under interview pressure is another. SpaceComplexity runs voice-based mock interviews on graph algorithms with rubric feedback on both.