Dijkstra vs A*: One Line of Code Separates Them

You have a graph. You want the shortest path. You reach for Dijkstra. Your friend says use A*. Neither of you is wrong, and neither of you can fully explain the difference without waving your hands a little. The Dijkstra vs A* decision comes down to one question: do you have a heuristic, and do you trust it.

A is Dijkstra with one extra term in the priority function.* Set that term to zero and A* becomes Dijkstra, visiting the same nodes in the same order. Make that term good and A* ignores most of the graph while still returning the optimal path. That's the whole story. The rest is understanding what "good" means.

The entire difference between these two algorithms lives in one term.

Both Algorithms Share the Same Structure

Dijkstra and A* maintain a priority queue of nodes to explore, sorted by cost. Both start at the source with cost 0 and relax edges by checking whether a newly discovered path to a neighbor beats the best known one.

The difference lives entirely in the priority function. Dijkstra sorts by g(n), the exact cost to reach node n from the source, as defined in Dijkstra's 1959 paper in Numerische Mathematik. A* sorts by f(n) = g(n) + h(n), where h(n) is a heuristic estimate of the remaining cost to the goal. That formulation is introduced and proved correct in Hart, Nilsson, and Raphael's 1968 paper, "A Formal Basis for the Heuristic Determination of Minimum Cost Paths."

import heapq

def dijkstra(graph, source):
    dist = {source: 0}
    heap = [(0, source)]
    while heap:
        g, u = heapq.heappop(heap)
        if g > dist.get(u, float('inf')):
            continue  # stale entry
        for v, w in graph[u]:
            if g + w < dist.get(v, float('inf')):
                dist[v] = g + w
                heapq.heappush(heap, (dist[v], v))
    return dist

def astar(graph, source, goal, h):
    g_score = {source: 0}
    heap = [(h(source), source)]
    while heap:
        f, u = heapq.heappop(heap)
        if u == goal:
            return g_score[goal]
        g = g_score[u]
        if f > g + h(u):
            continue  # stale entry
        for v, w in graph[u]:
            new_g = g + w
            if new_g < g_score.get(v, float('inf')):
                g_score[v] = new_g
                heapq.heappush(heap, (new_g + h(v), v))
    return float('inf')

Dijkstra returns the shortest distance to every reachable node. A* returns the shortest distance to one specific goal, guided by h. The only structural change: h(v) added to the heap priority, and an early exit when the goal is popped.

h = 0 Is Exactly Dijkstra

Set h(n) = 0 for every node. Now f(n) = g(n) + 0 = g(n). The priority queue sorts identically to Dijkstra's, and the two algorithms visit the same nodes in the same order.

This isn't a coincidence. A with a zero heuristic is Dijkstra.* They are the same algorithm at different points on a spectrum parameterized by how much information h provides. Think of Dijkstra as A* that went on vacation and forgot to pack a heuristic.

Larger h means fewer nodes expanded. But cross into inadmissible territory and you lose the optimality guarantee.

The closer h is to the true remaining cost h*, the fewer nodes A* expands. The further it sits from zero, the more work it saves over Dijkstra.

Admissibility Guarantees the Optimal Path

A heuristic is admissible if it never overestimates: h(n) ≤ h*(n) for every node n, where h*(n) is the true shortest cost from n to the goal.

With an admissible heuristic, A* is guaranteed to find the optimal path. This is the central result of Hart, Nilsson, and Raphael (1968). The proof is short and slightly smug.

Suppose A* returns a path to the goal with cost C > C*, where C* is the optimal cost. Since A* stopped at the goal, some node n on the true optimal path must still be in the open list. Its priority is f(n) = g(n) + h(n). Because the path to n is part of the optimal route, g(n) ≤ C*. Because h is admissible, h(n) ≤ h*(n). So f(n) ≤ g(n) + h*(n) ≤ C* < C. A* would have expanded n before the suboptimal goal. Contradiction. Optimal path guaranteed.

Euclidean distance is admissible for road maps because a straight line is always shorter than any real route. Manhattan distance is admissible for grid movement because you can never do better than traveling in a straight grid path. Both facts are obvious in hindsight and feel like cheating.

Consistency Makes Each Node Expand Exactly Once

Admissibility is enough for optimality, but there's a catch. Without consistency, A* can expand the same node multiple times if a cheaper path to it surfaces later. That's expensive. And kind of embarrassing.

A heuristic is consistent (also called monotone) if for every edge from u to v with cost c:

h(u) ≤ c(u, v) + h(v)

This is the triangle inequality applied to estimates. The estimated cost from u should not exceed the cost of stepping through v plus the estimate from v. Consistency implies admissibility, but not vice versa.

The triangle inequality for estimates. If your heuristic violates this, nodes can get re-expanded, which is wasteful and also ruins the party.

When h is consistent, the f-values of nodes popped from the heap are non-decreasing, exactly like Dijkstra's g-values. Each node is expanded at most once. The closed set is safe. You get Dijkstra's "finalize on pop" guarantee plus A*'s heuristic guidance.

In practice, most naturally constructed heuristics (Euclidean distance, Manhattan distance) are consistent. The admissible-but-not-consistent case is mostly a theoretical concern for people who enjoy inventing edge cases at 2am.

See How the Expansions Diverge on the Same Graph

Consider this graph. You want the shortest path from S to G.

The optimal path is S→A→G, cost 5. The path S→B→G costs 12.

Same graph. Dijkstra checks every door in the building. A walks straight to the one it wants.*

Dijkstra expands by g-value:

Pop S (g=0). Push A (g=1), B (g=10).
Pop A (g=1). Push G (g=5).
Pop G (g=5). Done. Returns 5.

Three expansions. Fine for a toy graph. Scale to a 10,000-node city map and Dijkstra expands every reachable node before it commits to the goal.

A* with h = straight-line distance to G:

h(S) = 5.1, h(A) = 3.8, h(B) = 2.0, h(G) = 0

Pop S (f = 0 + 5.1 = 5.1). Push A (f = 1 + 3.8 = 4.8), B (f = 10 + 2.0 = 12.0).
Pop A (f=4.8). Push G (f = 5 + 0 = 5.0).
Pop G (f=5.0). Done.

Same result, same number of expansions on this tiny example. B never gets touched. On a real map, A* avoids expanding nodes pointing away from the goal entirely. On graphs with good geometric heuristics, A often expands a small fraction of the nodes Dijkstra would touch for a single-target query.* The exact ratio depends on the heuristic and the graph: Hart, Nilsson, and Raphael (1968) formalized this with the dominance result below.

A Better Heuristic Always Expands Fewer Nodes

If you have two admissible heuristics h1 and h2 where h1(n) ≥ h2(n) for all n, h1 dominates h2. A* with h1 expands a subset of the nodes A* with h2 expands. This is the dominance theorem from Hart, Nilsson, and Raphael (1968).

This gives you a concrete lever for choosing between heuristics. For the 8-puzzle, Manhattan distance (sum of each tile's distance from its goal position) dominates misplaced-tiles count (tiles not in their goal position). Manhattan is always at least as large and never overestimates. Use Manhattan.

The ideal heuristic is h = h* (perfect knowledge of the remaining cost). Then A* expands exactly the nodes on the optimal path and nothing else. In practice, h* is usually the thing you're trying to compute, so you approximate as tightly as possible while staying admissible. It's heuristics all the way down.

The Time Cost Depends on Your Heuristic, Not the Graph Size

A's space is the painful part.* It stores every generated node in the open list. On large graphs with weak heuristics, that blows up to O(b^d), exponential in solution depth. You asked for a shortest path and got an out-of-memory error. Congratulations.

IDA* (iterative-deepening A*), introduced by Korf (1985), fixes this by running DFS with an f-threshold and raising the threshold each iteration. Space drops to O(d). The tradeoff is redundant work: nodes near the source get re-expanded on every iteration. For problems where memory is the bottleneck, IDA* wins. For problems where you can afford O(V) memory, standard A* is faster.

When to Use Dijkstra vs A*

Use Dijkstra when:

You need shortest paths from one source to all (or many) destinations. A*'s heuristic only helps for a specific goal. If you want every distance, heuristic guidance buys nothing.
No good heuristic exists. Abstract graphs, dependency graphs, social networks: there's no geometric estimate of "distance to goal."
You're implementing a network routing protocol. OSPF (Open Shortest Path First) runs Dijkstra because routers compute full routing tables, not single-destination paths.
You want the simplest correct implementation. Dijkstra has fewer moving parts and no heuristic to get wrong.

Use A* when:

You have a specific source and a specific destination.
The graph has geometric structure: maps, grids, game worlds, anything where spatial distance is a useful proxy for path cost.
Speed matters more than generality. Game AI, real-time navigation, robotics.
You can construct an admissible heuristic. If you can, the speedup is usually worth it.

Use weighted A* (f = g + w*h, w > 1) when you're willing to trade a bounded amount of optimality for speed. The solution is guaranteed to cost at most w times the optimal. This is the basis for ARA* (Anytime Repairing A*), introduced by Likhachev, Gordon, and Thrun at NIPS 2003. It starts with a large w for a fast initial solution and reduces w over time, reusing prior search effort each round. It's the algorithm equivalent of shipping a rough draft and iterating.

Quick Recap

A* adds h(n) to Dijkstra's priority function. Setting h = 0 recovers Dijkstra exactly.
An admissible heuristic (never overestimates) guarantees A* finds the optimal path.
A consistent heuristic (triangle inequality on edges) additionally ensures each node is expanded once.
Heuristic dominance: larger admissible h means fewer nodes expanded.
Dijkstra is the right default when no heuristic exists or when you need all-destinations distances.
A* is the right choice for single-target search on graphs with geometric structure.
A*'s space is O(b^d). IDA* solves this at the cost of redundant node re-expansion.

In interviews, the mechanics rarely trip people up. Narrating the tradeoff does. Why does adding one term to the priority function change the problem from "explore everything" to "carve a corridor to the goal"? Explaining that clearly, under pressure, in under two minutes, is what separates a good answer from a great one. If you want to practice exactly that kind of analysis, SpaceComplexity runs voice-based mock interviews with rubric-based feedback on your reasoning, not just your code.