Edit Distance Algorithm: The DP Problem Behind Every Spell Checker

You type "teh". Your phone corrects it to "the". It also, with equal confidence, sometimes corrects a perfectly valid word into something that makes no sense in context. But when it works, that fix isn't magic. It's a number: the minimum single-character edits needed to turn one string into another. That number is the edit distance. The algorithm that computes it is one of the cleanest dynamic programming problems you'll encounter in interviews, and it shows up absolutely everywhere in production.

One Distance, Three Operations

Edit distance (also called Levenshtein distance, named after Vladimir Levenshtein who described it in 1965) measures how different two strings are by counting the minimum operations needed to transform one into the other.

The three allowed operations are:

Insert a character
Delete a character
Substitute one character for another

Each costs exactly 1. That's the whole cost model. cat to bat is distance 1 (substitute c with b). cat to at is also 1 (delete c). Two entirely different edits, same price.

The canonical example: "kitten" to "sitting" has edit distance 3.

Substitute k with s: kitten -> sitten
Substitute e with i: sitten -> sittin
Insert g at end: sittin -> sitting

Spend a few minutes convincing yourself no three-step shortcut exists. There isn't one. This is the kind of claim that sounds obvious until you sit down to actually verify it.

Why Brute Force Stops You Cold

Your first instinct is recursion: at each position, try all three operations and take the minimum. This instinct is correct in shape and catastrophic in performance. For strings of length m and n, the call tree fans out exponentially. The subproblem "what's the edit distance between the first 3 characters of word1 and the first 2 of word2?" gets recomputed hundreds of times, blissfully unaware you answered the same question five branches ago.

This is the textbook signal for dynamic programming: overlapping subproblems with optimal substructure. If the optimal edit path from "kitten" to "sitting" passes through some intermediate state, then the path forward from that state also has to be optimal. If it weren't, you could swap in a cheaper onward path, which would contradict the original being optimal. One-line proof, feel free to use it in the interview.

If you're shaky on when to reach for DP, this framework for recognizing DP problems is worth reading before you continue. The DP framework post covers the top-down vs bottom-up decision in more depth.

The Recurrence That Turns Exponential Into O(m*n)

Define dp[i][j] as the edit distance between the first i characters of word1 and the first j characters of word2.

Build the table from small cases up. The recurrence falls out from one question: what's the cheapest way to handle character i of word1 against character j of word2?

If the characters match, you get a free pass:

dp[i][j] = dp[i-1][j-1]     # characters match, no operation needed

If they don't match, pick the cheapest of three options:

dp[i][j] = 1 + min(
    dp[i-1][j-1],   # substitute word1[i] with word2[j]
    dp[i-1][j],     # delete word1[i]
    dp[i][j-1]      # insert word2[j] into word1
)

Those three neighbors form a little triangle above and to the left of the current cell. Substitute looks diagonally. Delete looks up. Insert looks left. Draw this on a whiteboard once and it'll stay with you.

The base cases handle transforming to or from the empty string: dp[i][0] = i (deleting all i characters) and dp[0][j] = j (inserting all j characters). Write these down before you code anything else. They will absolutely come up in the interview.

What Does the Table Actually Say?

Let's trace "horse" -> "ros" (LeetCode 72's example). The answer is 3. Rows index into "horse", columns into "ros":

DP table showing edit distance computation from "horse" to "ros", with the answer 3 in the bottom-right cell

		r	o	s
	0	1	2	3
h	1	1	2	3
o	2	2	1	2
r	3	2	2	2
s	4	3	3	2
e	5	4	4	3

Bottom-right reads 3. Done.

Pick one cell to verify: dp[2][2] compares "ho" vs "ro". Characters differ (h vs r), so: 1 + min(dp[1][1], dp[1][2], dp[2][1]) = 1 + min(1, 2, 2) = 2. Checks out. You should do this verification step on your own table in the interview. Unprompted.

The Implementation

def min_distance(word1: str, word2: str) -> int:
    m, n = len(word1), len(word2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(m + 1):
        dp[i][0] = i
    for j in range(n + 1):
        dp[0][j] = j

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if word1[i - 1] == word2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1]
            else:
                dp[i][j] = 1 + min(dp[i - 1][j - 1], dp[i - 1][j], dp[i][j - 1])

    return dp[m][n]

function minDistance(word1: string, word2: string): number {
    const m = word1.length;
    const n = word2.length;
    const dp: number[][] = Array.from({ length: m + 1 }, () => Array(n + 1).fill(0));

    for (let i = 0; i <= m; i++) dp[i][0] = i;
    for (let j = 0; j <= n; j++) dp[0][j] = j;

    for (let i = 1; i <= m; i++) {
        for (let j = 1; j <= n; j++) {
            if (word1[i - 1] === word2[j - 1]) {
                dp[i][j] = dp[i - 1][j - 1];
            } else {
                dp[i][j] = 1 + Math.min(dp[i - 1][j - 1], dp[i - 1][j], dp[i][j - 1]);
            }
        }
    }

    return dp[m][n];
}

O(m * n) Time, and How to Halve the Space

Time is O(m * n) because you fill every cell exactly once. Space drops to O(min(m, n)) with a rolling array. Each row only looks at the row directly above it, so you only need two rows alive at once.

def min_distance_optimized(word1: str, word2: str) -> int:
    m, n = len(word1), len(word2)
    if m < n:
        word1, word2 = word2, word1
        m, n = n, m

    prev = list(range(n + 1))
    curr = [0] * (n + 1)

    for i in range(1, m + 1):
        curr[0] = i
        for j in range(1, n + 1):
            if word1[i - 1] == word2[j - 1]:
                curr[j] = prev[j - 1]
            else:
                curr[j] = 1 + min(prev[j - 1], prev[j], curr[j - 1])
        prev, curr = curr, prev

    return prev[n]

The tradeoff: once you discard old rows, you can no longer reconstruct which specific operations were performed. If you need the actual edit path and not just the distance, either keep the full table or use Hirschberg's algorithm, which recovers the path in O(m + n) space at the same O(m * n) time cost.

For most interview problems, you only need the count. Go with the rolling array. The bottom-up DP and space optimization post covers why this pattern generalizes beyond edit distance.

Why Interviews Love This Problem

LeetCode 72 is labeled Hard. It earns that label not because the recurrence is clever, but because it punishes half-understood DP more mercilessly than almost anything else at this difficulty level. You can walk in having memorized the recurrence and still go completely off the rails when an interviewer asks one pointed question.

Here's what they probe:

Index handling. The table is (m+1) x (n+1) because row 0 and column 0 represent the empty string, not the first characters. Miss this and your base cases are subtly wrong. The resulting bugs look like algorithmic errors, not implementation slip-ups, which is the worst kind.

Base case reasoning. dp[i][0] = i encodes "it takes i deletions to reduce an i-character string to empty." Candidates who can't explain this out loud, in words, while maintaining eye contact, score lower on problem-solving even when their code is correct. Memorized code without understanding is obvious from three feet away.

Naming the three operations. When word1[i-1] != word2[j-1], which of dp[i-1][j-1], dp[i-1][j], and dp[i][j-1] corresponds to substitute, delete, and insert? Interviewers ask this directly. Staring at your own code for fifteen seconds while trying to reconstruct it from first principles is not a great look.

Complexity follow-up. Can you reduce to O(min(m, n)) space and explain exactly why the rolling array is valid? This is the question that separates candidates who understand the table from those who pattern-matched their way to a solution.

The problem also shows up as a subcomponent in harder variants. "Return true if you can make word1 equal word2 with at most k changes" reduces to checking dp[m][n] <= k. It also underlies string similarity scoring in NLP pipelines, where you often want a normalized distance rather than a raw count.

If you want to practice explaining this recurrence out loud under real interview pressure, SpaceComplexity runs voice-based DSA mock interviews with rubric-based feedback on exactly this kind of DP depth. Talking through a table trace while someone is watching is a completely different skill from writing it silently on LeetCode.

The Edit Distance Algorithm in Production

The algorithm is everywhere once you know to look for it. Git's diff algorithm is based on a related sequence alignment approach: every git diff output you've ever read was the result of something in this family of problems. Spell checkers rank candidate corrections by edit distance to your typo, including the one that confidently replaced "teh" with the wrong word entirely. DNA alignment tools measure gene sequence similarity by counting mutations: insertions, deletions, and substitutions map directly onto the three operations. Fuzzy database matching joins records that humans entered slightly differently across two tables.

One production gap worth knowing: standard Levenshtein counts a transposition ("teh" to "the") as 2 operations (delete t, insert it in the right position), even though it's one physical finger slip. The Damerau-Levenshtein variant adds transposition as a fourth operation, cost 1. Most production spell checkers use this extended version. In interviews, assume standard Levenshtein unless told otherwise.