A* Search Algorithm: The Heuristic That Guarantees the Shortest Path

Peter Hart, Nils Nilsson, and Bertram Raphael were trying to make a mobile robot named Shakey navigate a room. The year was 1968. Dijkstra's algorithm could find the shortest path, but it expanded nodes in every direction, blind to the goal. What if you could give the algorithm a compass, a nudge toward the destination, without sacrificing the guarantee of an optimal result? The answer was the A* search algorithm (pronounced "A-star"), and it became one of the most cited algorithms in computer science. Every GPS app, every video game NPC, every robot path planner uses some version of what Shakey's team invented that year. Shakey itself ended up in a museum. A* did not.

Dijkstra, computing every possible route before texting back. A would have just looked at the map and gone.*

The mental model is simple: A* is Dijkstra's algorithm with a heuristic. At each step, A picks the node that looks most promising in terms of total estimated path cost, not just cost so far.* That single change, adding a forward-looking guess to the backward-looking cost, makes A* dramatically faster in practice while preserving the optimality guarantee, as long as the guess obeys one rule.

The Three Numbers Behind Every Node

Every node in A* carries three values:

g(n): the actual cost from the start to n along the best known path
h(n): the heuristic estimate of the remaining cost from n to the goal
f(n) = g(n) + h(n): the estimated total cost of a path through n

A* always expands the node with the smallest f. Dijkstra expands the node with the smallest g. Greedy best-first search expands the node with the smallest h. A is the combination: it cares about where you've been (g) and where you're going (h).*

When h is zero for all nodes, A* is identical to Dijkstra. When g is ignored and only h drives expansion, you get greedy best-first, which is fast but not optimal. A* sits in between, and the trick is picking an h that's informative without being dishonest.

f(n) = g(n) + h(n)
       ┌──────┐   ┌──────────────────┐
       │known │ + │estimated (guess) │
       └──────┘   └──────────────────┘

The blue arrow is what you know. The dashed amber arrow is what you're guessing. f is you being optimistic in a principled way.

How A* Actually Runs

A* maintains two collections:

Open set: nodes discovered but not yet fully processed, stored in a min-heap keyed by f
Closed set: nodes already expanded, stored in a hash set for O(1) lookup

The algorithm runs as follows:

1. Push start onto open with f = h(start)
2. While open is not empty:
   a. Pop node current with lowest f
   b. If current == goal, reconstruct and return path
   c. Skip if current is stale (lazy deletion, see below)
   d. Mark current as closed
   e. For each neighbor of current:
      tentative_g = g[current] + cost(current, neighbor)
      If tentative_g < g[neighbor]:
        g[neighbor] = tentative_g
        came_from[neighbor] = current
        Push (f = tentative_g + h(neighbor), neighbor) onto open
3. Return no path found

The most important implementation detail is lazy deletion. Standard heaps do not support updating the priority of an existing entry. Rather than re-implementing decrease-key (which requires an indexed heap or Fibonacci heap), most practical A* implementations simply push a new entry when a better path is found and skip stale entries when they're popped. You detect staleness by comparing the g value at pop time to g_score[current]: if they differ, the entry is old and you discard it.

It is the algorithmic equivalent of ignoring an old text message. The new version of you already handled it. The stale entry shows up, you check the timestamp, and you move on.

This keeps the code simple and the asymptotic complexity the same.

The open set is a waiting room sorted by who looks most promising. The closed set is everyone who already had their turn.

The Expansion Order, Visualized

Consider a small grid where S is start, G is goal, and # is a wall:

. . . . . . . .
. S . . # . . .
. . . . # . G .
. . . . # . . .
. . . . . . . .

Dijkstra expands in concentric rings outward from S, blind to the wall or the goal location. Roughly circular wavefront.

A with Manhattan distance* skews immediately toward G. It still has to detour around the wall, but the open set stays focused on the promising direction rather than flooding the whole grid.

Dijkstra expansion        A* expansion
(roughly circular)        (focused toward G)

. . . . . . . .           . . . . . . . .
. S . . # . . .           . S→→→# . . .
. . . . # . G .           . ↓ . ↑# . G .
. . . . # . . .           . →→→→# . . .
. . . . . . . .           . . . . . . . .

A finds the same optimal path but touches far fewer nodes.*

Dijkstra on the left: thorough, democratic, blind. A on the right: opinionated, fast, still correct.*

Why A* Doesn't Lie to You: Admissibility

The heuristic h must be admissible: for every node n, h(n) must never exceed the true cost h*(n) from n to the goal.

h(n) ≤ h*(n)   for all n

Admissibility means the heuristic is optimistic. It may underestimate, but it never overestimates. Think of it as a friend who gives directions: they might say "it's about 20 minutes" when it's actually 25, but they'll never tell you it's 10 minutes when it's actually 45.

Theorem (A optimality with admissible h):* If h is admissible, A* returns an optimal path.

Proof sketch: Suppose A* returns a goal G with suboptimal cost g(G) > C* where C* is the optimal cost. Let n be a node on an optimal path that remains in the open set at the moment A* selects G. Since h is admissible:

f(n) = g(n) + h(n) ≤ g*(n) + h*(n) = C* < g(G) = f(G)

So A* would select n before G. Following this argument along the entire optimal path, A* would reach the optimal goal before returning G. Contradiction.

Consistency: The Stronger Condition You Actually Want

Consistency (also called the monotone condition) is a stronger property:

h(n) ≤ cost(n, n') + h(n')   for all edges n → n'

This is the triangle inequality applied to h. If you can get to n' directly with cost c, then the remaining estimate from n to goal via n' should be at most c + h(n').

Every consistent heuristic is admissible. Not every admissible heuristic is consistent.

Consistency gives you something more valuable: when a node is first popped from the open set, its g value is already optimal. No re-expansion needed.

Why? Because consistency implies f is non-decreasing along any path:

f(n') = g(n') + h(n')
      = g(n) + cost(n,n') + h(n')
      ≥ g(n) + h(n)     (by consistency)
      = f(n)

Nodes are popped in non-decreasing f order. The first time a node is popped, it has the lowest possible f, which (combined with non-decreasing f) means the lowest possible g. So the closed set is safe: you never need to revisit a closed node.

With only admissibility (not consistency), A is still optimal but may re-expand a node if a cheaper path to it is discovered after it's been closed.*

Left: admissibility says your guess can be low, not high. Right: consistency says your guess can't jump more than one edge step at a time.

Tighter Heuristics Always Win

If you have two admissible heuristics h1 and h2 with h1(n) ≥ h2(n) for all n, then h1 dominates h2. A with h1 expands a subset of the nodes that A with h2 expands**, so you always prefer the stronger (tighter) heuristic.

For a grid problem with 4-directional movement: Manhattan distance dominates the misplaced-tiles heuristic, which dominates h=0. Manhattan distance is admissible because you must move at least |Δx| + |Δy| steps regardless of obstacles.

How Fast Is A*?

Operation	Time Complexity	Notes
Push to open set	O(log n)	Binary heap insertion
Extract min from open set	O(log n)	Heap extraction
Lookup in closed set	O(1) average	Hash set
Lookup / update g_score	O(1) average	Hash map
Path reconstruction	O(d)	d = path length, follow came_from
Total (graph with V nodes, E edges)	O(E log V)	With consistent h and lazy deletion
Space	O(V)	Open + closed + g_score + came_from

The time bound O(E log V) holds when the heuristic is consistent and you use lazy deletion. Each edge (u, v) generates at most one heap push (when a better path through u is found). Each pop is O(log n) where n ≤ E (the heap can have at most E entries with lazy deletion). Total: O(E log E) = O(E log V) since E ≤ V².

The practical question is not E log V but how many nodes A actually expands.* With a perfect heuristic (h = h*), only O(d) nodes along the optimal path are expanded. With h = 0, you get Dijkstra's O(V + E) node visits (though each takes O(log V) with a heap). Real heuristics give effective branching factors between 1 and the true branching factor b. The goal is to get as close to b* = 1 as possible.

Space is often the binding constraint, not time. A* stores every discovered node in the open or closed set. On large problems (15-puzzle, game maps), this can exhaust memory before the algorithm finishes. The 15-puzzle has a state space of roughly 10^13. Your laptop has about 10^10 bytes of RAM. Math wins. IDA* (Iterative Deepening A*) trades time for space: it uses depth-first search with a threshold on f, incrementing the threshold each iteration. Space is O(d), but nodes get re-visited across iterations. For problems where h is cheap to compute and memory is scarce, IDA* is the right choice.

The Heuristic Makes the Search

The art of A is picking a heuristic.* It must be admissible and ideally consistent. Stronger is better, as long as it stays admissible.

Problem	Admissible Heuristic	Why It Works
Grid, 4-directional	Manhattan distance	Can't move diagonally, so at least
Grid, 8-directional (king moves)	Chebyshev distance	max(
Grid, any angle	Euclidean distance	Straight line is shortest possible
8-puzzle	Sum of Manhattan distances of tiles	Each tile moves independently at minimum
8-puzzle (stronger)	Pattern database	Precomputed optimal cost for tile subsets
Word ladder	Edit distance to goal	Minimum changes needed
General weighted graph	0 (Dijkstra fallback)	Always admissible, gives no guidance

Manhattan distance for a 3×3 tile puzzle:
Goal state:    Current state:   Per-tile Manhattan distances:
1 2 3          1 4 2            tile 2: |0,1 - 0,2| + |0,0 - 0,0| = 1
4 5 6          3 5 6            tile 3: |0,2 - 1,0| = 2 steps
7 8 _          7 _ 8            ...sum = h(state) ≤ h*(state)

The sum-of-Manhattan-distances heuristic for the 8-puzzle is consistent: moving one tile changes h by at most 1, and that tile moved at least 1 step (its cost), so h(n') ≥ h(n) - 1 = h(n) - cost(n,n').

One Structure, Every Language

All implementations below use the lazy deletion approach. The function takes a 2D grid (0 = passable, 1 = wall), a start coordinate, and a goal coordinate. It returns the path as a list of (row, col) tuples/pairs, or signals failure.

import heapq
from typing import Optional

def astar(
    grid: list[list[int]],
    start: tuple[int, int],
    goal: tuple[int, int],
) -> Optional[list[tuple[int, int]]]:
    rows, cols = len(grid), len(grid[0])

    def heuristic(a: tuple[int, int], b: tuple[int, int]) -> int:
        return abs(a[0] - b[0]) + abs(a[1] - b[1])

    def neighbors(node: tuple[int, int]):
        r, c = node
        for dr, dc in ((-1, 0), (1, 0), (0, -1), (0, 1)):
            nr, nc = r + dr, c + dc
            if 0 <= nr < rows and 0 <= nc < cols and grid[nr][nc] == 0:
                yield (nr, nc)

    g_score: dict[tuple[int, int], int] = {start: 0}
    came_from: dict[tuple[int, int], tuple[int, int]] = {}
    open_heap: list[tuple[int, int, tuple[int, int]]] = [
        (heuristic(start, goal), 0, start)
    ]

    while open_heap:
        _, g_current, current = heapq.heappop(open_heap)

        if current == goal:
            path = []
            while current in came_from:
                path.append(current)
                current = came_from[current]
            path.append(start)
            return path[::-1]

        if g_current > g_score.get(current, float("inf")):
            continue  # stale entry

        for neighbor in neighbors(current):
            tentative_g = g_current + 1
            if tentative_g < g_score.get(neighbor, float("inf")):
                g_score[neighbor] = tentative_g
                came_from[neighbor] = current
                f = tentative_g + heuristic(neighbor, goal)
                heapq.heappush(open_heap, (f, tentative_g, neighbor))

    return None

What Problems A* Solves

A* belongs to the family of best-first heuristic search algorithms. It's the right tool whenever you need the shortest or cheapest path through a state space and you can define a heuristic that estimates remaining cost.

Grid pathfinding. Any map-based navigation: game levels, robot floors, warehouse layouts. The heuristic is geometric (Manhattan or Euclidean distance). This is the use case A* was born for.

Graph pathfinding with geometry. Road networks, floor plans, city maps. Geographic coordinates give you a natural Euclidean heuristic. GPS routing uses A* (or bidirectional A*) over road graphs with hundreds of millions of edges.

Puzzle solving. The 8-puzzle, 15-puzzle, Rubik's cube (with admissible pattern-database heuristics). A* on these state spaces finds optimal solutions that BFS and Dijkstra can't reach in reasonable time.

State space search with domain knowledge. Any problem you can model as: state + transitions + goal + heuristic. Planning problems in AI (STRIPS, PDDL planners), protein folding, automated theorem proving, compiler optimization (register allocation search).

Where A is the wrong call.* No heuristic? Use Dijkstra. All-pairs shortest paths? Floyd-Warshall or Johnson's. Graph changing live? D* Lite. Memory gone before you find the goal? IDA*. A* is not the universal answer, but it's the right answer surprisingly often.

When to Reach for A*

The signal is a specific combination:

You need the optimal path (not just any path)
You have a state space (explicit graph or implicit, explored on the fly)
You have domain knowledge you can encode as a cheap-to-compute, admissible heuristic
The search space is large enough that Dijkstra's blind expansion is too slow

If you're missing condition 3, you have Dijkstra. If you're willing to trade optimality for speed, you have greedy best-first or weighted A*.

Worked Example 1: Game Map with Terrain Costs

Problem. A unit must cross a 100x100 grid from (0,0) to (99,99). Cells are passable (cost 1), forest (cost 3), or water (cost 5). Find the minimum-cost path.

Why A.* There's a spatial structure (coordinates exist), so we have a heuristic: Euclidean distance divided by the minimum terrain cost (1). This is admissible because no path can beat traveling in a straight line at minimum cost. A* uses this to avoid expanding cells that are clearly too far in the wrong direction. Dijkstra would expand 10,000 cells; A* expands far fewer because the heuristic filters.

Non-unit costs. When edge weights vary, change tentative_g = gCurrent + edge_cost in the implementation. The rest stays the same.

Worked Example 2: Word Ladder

Problem. Find the shortest sequence of English words from "cold" to "warm" where each step changes exactly one letter and every intermediate word is valid. (LeetCode 127.)

Why A.* Model words as graph nodes. Two words are adjacent if they differ by one letter (cost 1). The heuristic: count how many characters differ between the current word and the goal. This is admissible because each step changes exactly one character, so you need at least that many more steps. A* guided by this heuristic explores the word space in the direction of the goal word, rather than expanding uniformly from the start.

def word_ladder_heuristic(word: str, goal: str) -> int:
    # Admissible: count character mismatches (each step fixes at most one)
    return sum(a != b for a, b in zip(word, goal))

With BFS, you'd explore every word reachable in k steps before finding the answer at step k+1. A* explores only the words that look like they're heading toward "warm."

Worked Example 3: Robot Motion Planning

Problem. A robot in a 2D plane must reach a goal pose (x, y, angle) with minimum turning cost. States are (x, y, heading). Transitions have non-uniform costs (turning is expensive, driving straight is cheap).

Why A.* The state space is continuous but can be discretized. The heuristic: Euclidean distance to goal (ignoring heading and turning cost). Admissible because driving in a straight line is the cheapest possible movement. A* on this motion planning graph produces paths that are optimal under the turning cost model, which BFS (ignoring costs) cannot do.

Traps That Bite After the Basics

The admissibility/consistency confusion. Most textbook examples use admissible heuristics that also happen to be consistent. But if your heuristic is only admissible (not consistent), A* with a closed set can produce suboptimal results. The fix: either use a consistent heuristic, or re-open closed nodes when a cheaper path is found. In practice, Manhattan distance and Euclidean distance are both consistent, so this rarely bites you in grid problems. It matters for custom heuristics.

The space explosion. A* stores every visited node. For a 1000x1000 grid, that's 10^6 nodes. For the 15-puzzle, the state space is 10^13. A* will exhaust memory on the 15-puzzle long before it finds the goal. IDA* is the answer: same optimality, O(d) space, at the cost of re-expanding nodes on each iteration. For the 15-puzzle, IDA* with the Manhattan distance heuristic is standard.

Tie-breaking matters. When many nodes have equal f, the order you expand them is up to implementation. A common trick: break ties by preferring higher g (closer to goal) or lower h. In a uniform-cost grid this is the difference between a focused beam and a smeared blob. Breaking ties toward higher g keeps the search focused and can dramatically cut wall-time in uniform-cost environments.

Heuristic inflation. Weighted A* (f = g + w*h, w > 1) finds a solution quickly but with cost at most w times optimal. This is fine if you need speed and can tolerate a bounded error. The danger is forgetting you did it and reporting the result as optimal. Which is fine for a game NPC, and a catastrophe for a surgical robot.

Quick Recap

A* maintains a priority queue (open set) and a visited set (closed set). Always expand the node with lowest f = g + h.
g is exact; h is a guess. The sum f estimates the total path cost through that node.
Admissibility (h ≤ h*) guarantees optimality. Consistency (triangle inequality on h) additionally guarantees each node is expanded at most once.
Time complexity is O(E log V) with a binary heap and lazy deletion. Space is O(V).
In practice, heuristic quality controls expansion count. Perfect heuristic = O(d) nodes. Zero heuristic = Dijkstra.
Use A* when you have a large state space, need optimality, and have a computable admissible heuristic.
When memory is the constraint, IDA* trades time for O(d) space.
Weighted A* trades optimality for speed with a bounded error guarantee.

If you want to practice explaining A* while an AI pushes back on your heuristic choice or complexity claim, SpaceComplexity runs voice-based mock interviews with rubric scoring on exactly these problems. The gap between knowing A* and being able to walk an interviewer through the admissibility proof under pressure is where prep usually breaks down.

For related reference material on graph traversal and the algorithms A* builds on, see the Dijkstra's algorithm deep dive for the uniform-cost foundation, BFS vs DFS for the uninformed baseline, and the heap data structure guide for a full breakdown of the priority queue mechanics A*'s open set depends on.