The KMP Algorithm: Search in O(n+m) by Never Going Backward

May 26, 202628 min read
dsaalgorithmsinterview-prepleetcode
The KMP Algorithm: Search in O(n+m) by Never Going Backward
TL;DR
  • The KMP algorithm preprocesses the pattern into an LPS (failure) table in O(m), then scans the text in O(n), for O(n+m) total with no backtracking on the text pointer.
  • lps[i] stores the length of the longest proper prefix of pattern[0..i] that equals its suffix, so a mismatch tells you exactly how far back to restart the pattern.
  • The text pointer i never decrements. Pattern pointer j can fall back via lps[j-1], but its total decrements are bounded by its total increments, both at most n.
  • After a full match, set j = lps[m-1] not 0, to correctly find overlapping occurrences (e.g., two matches of "AA" inside "AAA").
  • lps[m-1] encodes periodicity: if m % (m - lps[m-1]) == 0, the pattern is a repeated substring, solving LeetCode 459 in one line of arithmetic.
  • KMP wins on small alphabets (DNA = 4 chars, binary = 2 chars) where Boyer-Moore's bad-character table gives almost no skip benefit.

You have a 100 MB log file and a 20-character error pattern. The naive approach reads every byte of the file and, for each position, potentially re-reads all 20 characters of the pattern before giving up. In the worst case (a file filled with "AAAAAB..." where you match 19 A's before failing) you do 2 billion comparisons. Your laptop fan will have opinions about this.

The KMP algorithm reads each character of the text exactly once. The text pointer moves only forward. Never backward. No matter how badly the pattern fails to match, you never retrace your steps through the text.

The mental model: a pattern that has memorized its own structure, so when it fails at any position, it already knows the furthest point it can safely restart from.

Use it when you need guaranteed linear-time substring search, especially when the alphabet is small (DNA is four characters, binary data is two) or when the pattern has strong internal repetition. It's also the cleanest solution for LeetCode's "strStr" family of problems.

What the Naive Approach Costs You

Naive search maintains two indices: i into the text and j into the pattern. When text[i] == pattern[j], both advance. When they diverge:

text:    A A A A A A A A A B
pattern: A A A A A A A A A B
         i=0               i=9 -> MATCH at 0

On next attempt:
text:    A A A A A A A A A B
pattern:   A A A A A A A A A B
           i=1               i=10 -> MATCH at 1

For text of length n and pattern of length m, if they almost-match everywhere, you do O(nm) comparisons. The bottleneck is that after a mismatch you reset j = 0 and advance i by just one. You throw away everything you learned about the previous partial match. The naive approach has the memory of a goldfish.

But you do know something. When you matched j characters and then failed, you know exactly what those j characters were: they were pattern[0..j-1]. The text characters you read are not a mystery. They're baked into the pattern itself.

KMP exploits this: instead of resetting j = 0, it restarts the pattern at the longest prefix that is already aligned at the text's current position. You never throw away a confirmed match.

Naive O(nm) vs KMP O(n+m): naive rewinds the text pointer repeatedly, KMP never does Naive search resets i backward on every failed attempt. KMP keeps i moving forward and falls back only j, which is bounded.

The Failure Table: How a Pattern Learns Its Own Structure

The failure function. Great name. Extremely encouraging.

The failure function (also called the LPS array, for "Longest Proper Prefix which is also a Suffix," or the prefix function) is a 1D integer array of length m. lps[i] is the length of the longest proper prefix of pattern[0..i] that is also a suffix of pattern[0..i].

"Proper" means strictly shorter than the whole string. So lps[0] is always 0 by definition.

For pattern "ABCAB":

i   substring   longest proper prefix = suffix   lps[i]
0   A           (none)                            0
1   AB          (none, "A" prefix != "B" suffix)  0
2   ABC         (none)                            0
3   ABCA        "A" = prefix = suffix             1
4   ABCAB       "AB" = prefix = suffix            2

So lps = [0, 0, 0, 1, 2].

The value lps[4] = 2 means: the last 2 characters of pattern[0..4] spell "AB", which is also the first 2 characters of the pattern. If you matched 5 characters and then hit a mismatch, you can restart the pattern comparison at index 2, because the text already has "AB" sitting right there.

Why the Build Runs in O(m)

The construction algorithm looks deceptively like it might be O(m²). There's a while loop inside a while loop. It wants you to think O(m²). It's wrong.

maintain: length = length of previous longest prefix-suffix
          i = current position in pattern (starts at 1)

while i < m:
    if pattern[i] == pattern[length]:
        length += 1
        lps[i] = length
        i += 1
    else:
        if length != 0:
            length = lps[length - 1]   // do NOT increment i
        else:
            lps[i] = 0
            i += 1

The critical line is length = lps[length - 1]. When pattern[i] doesn't match pattern[length], you can't extend the current border. But you can ask: is there a shorter border that still works? lps[length - 1] gives you the longest border of the border. You follow this chain of fallbacks until either you find a match or length hits zero.

This chain traversal is O(m) amortized, not O(m²), because length can only decrease as many times as it has increased, and it increases at most once per step of i.

More precisely: length increases by 1 exactly when we advance i. Since i goes from 1 to m, total increases to length are at most m - 1. Each lps[length - 1] step strictly decreases length. Total decreases are therefore also bounded by m - 1. The whole construction runs in O(m).

Tracing "ABABCAB"

Let me walk through lps construction for a slightly richer example, "ABABCAB":

lps = [0, _, _, _, _, _, _]    length=0

i=1: 'B' vs pattern[0]='A'  mismatch, length=0 → lps[1]=0, i=2
i=2: 'A' vs 'A'             match → length=1, lps[2]=1, i=3
i=3: 'B' vs pattern[1]='B'  match → length=2, lps[3]=2, i=4
i=4: 'C' vs pattern[2]='A'  mismatch, length=2 → length=lps[1]=0
     'C' vs pattern[0]='A'  mismatch, length=0 → lps[4]=0, i=5
i=5: 'A' vs 'A'             match → length=1, lps[5]=1, i=6
i=6: 'B' vs pattern[1]='B'  match → length=2, lps[6]=2, i=7

lps = [0, 0, 1, 2, 0, 1, 2]

Notice i=4: when we hit the C, length drops from 2 to lps[1]=0 in one jump, then fails again and we set lps[4]=0. That's the fallback chain in action.

Pattern:  A  B  A  B  C  A  B
LPS:      0  0  1  2  0  1  2
          |        |        |
          +-"AB"---+        |
                   +-"AB"---+

The arrows show the "if you fail here, restart here" links. If you match 4 characters ("ABAB") and fail on the 5th, jump to index 2: the pattern's first 2 characters ("AB") are already lined up.

LPS (failure function) table for pattern "ABABCAB", blue shows one prefix-suffix pair, amber shows another Each non-zero LPS value is a restart point. Fail at j=4, restart at j=lps[3]=2. The pattern remembers where it overlaps itself.

The Search: i Never Goes Backward

The search falls out almost directly from the LPS table.

i = 0   // text index
j = 0   // pattern index

while i < n:
    if text[i] == pattern[j]:
        i += 1
        j += 1

    if j == m:                        // full match found
        record match at i - j
        j = lps[j - 1]               // continue for overlapping matches

    elif text[i] != pattern[j]:       // mismatch
        if j != 0:
            j = lps[j - 1]           // shift pattern, don't move i
        else:
            i += 1                    // pattern exhausted, advance text

The text index i only increments. It never decrements. That's the key property. After processing text[i], you never come back to it.

j can decrease (via lps[j-1]) but each decrease is bounded by a prior increase, and increases only happen alongside i increases. So total j-moves are O(n). Combined with the O(m) preprocessing, you get O(n + m) total.

Tracing a Search

Text: "ABCABABCAB", Pattern: "ABCAB", LPS: [0, 0, 0, 1, 2]

i=0,j=0: A==A → i=1,j=1
i=1,j=1: B==B → i=2,j=2
i=2,j=2: C==C → i=3,j=3
i=3,j=3: A==A → i=4,j=4
i=4,j=4: B==B → i=5,j=5=m → MATCH at index 0
                              j = lps[4] = 2

i=5,j=2: 'A' vs pattern[2]='C': mismatch, j=lps[1]=0
i=5,j=0: A==A → i=6,j=1
i=6,j=1: B==B → i=7,j=2
i=7,j=2: C==C → i=8,j=3
i=8,j=3: A==A → i=9,j=4
i=9,j=4: B==B → i=10,j=5=m → MATCH at index 5

Two matches. The text pointer i went from 0 to 10, never back.

After the first match, we used j = lps[4] = 2 rather than j = 0. We kept "AB" in position because those two characters are a valid prefix of the pattern and they're sitting right there in the text. This is how KMP handles overlapping matches correctly, and it's a common bug in implementations that reset j = 0 after a match.

KMP search trace: i moves from 0 to 9 without a single backward step; two matches found Two matches, zero backtracking. After the first match at index 0, j resets to lps[4]=2 rather than 0, so "AB" stays in play for the second match.

What You Actually Pay

OperationTimeSpaceNotes
Build LPSO(m)O(m)amortized; length increases ≤ m times
SearchO(n)O(1)i never decrements; j shifts bounded by total increases
TotalO(n + m)O(m)text and pattern each processed once

Worst case = average case for KMP. There is no lucky input. A pattern like "AAAA...AB" searching "AAAA...A" triggers the maximum fallback chain at every position, and it's still O(n + m). The algorithm does not care about your inputs.

Naive search is also O(nm) worst case, and that worst case is common (DNA sequences, binary data, any text with repetition).

Boyer-Moore with the bad-character rule is faster in practice for English text (3 to 4 characters skipped per step on average) because it can skip forward by more than 1. KMP always moves i forward by exactly 1 per mismatch (when j == 0). For small alphabets like DNA (four characters) or binary data (two characters), Boyer-Moore's preprocessing table is too dense to help much, and KMP often wins.

The Sentinel Trick

There's a cleaner way to think about KMP search that avoids implementing two separate functions. Concatenate pattern + "#" + text where "#" is a separator character not in either string. Then build the LPS array for this combined string.

Wherever lps[i] == m, you have a position where the preceding m characters match the pattern. The match starts at i - 2*m in the combined string, which is i - 2*m - 1 + m = i - m - 1 in the original text.

The "#" separator is not optional. Skip it and the pattern's prefix bleeds into the text portion of the LPS computation, giving you spurious values and matches that were never there. The separator guarantees lps[i] < m for all positions until a real match occurs.

This is how many competitive programmers implement KMP: one LPS function, no separate search logic.

The Periodicity Hidden in lps[m-1]

If m % (m - lps[m-1]) == 0, the pattern is composed of a single substring repeated multiple times.

For "ABABABAB", lps[7] = 6. Period = 8 - 6 = 2. 8 % 2 == 0. So the pattern is "AB" repeated 4 times.

This falls directly out of the LPS definition: if lps[m-1] = k, then the first k characters equal the last k characters. If the remainder m - k divides m, that remainder is the shortest repeating unit. This is LeetCode 459 ("Repeated Substring Pattern") in one line of arithmetic after building the LPS array.

One Structure, Every Language

All implementations expose two functions: buildLps (constructs the failure table) and kmpSearch (runs the search, returns all match start indices).

from typing import List def build_lps(pattern: str) -> List[int]: m = len(pattern) lps = [0] * m length = 0 i = 1 while i < m: if pattern[i] == pattern[length]: length += 1 lps[i] = length i += 1 else: if length != 0: length = lps[length - 1] else: lps[i] = 0 i += 1 return lps def kmp_search(text: str, pattern: str) -> List[int]: n, m = len(text), len(pattern) if m == 0: return list(range(n + 1)) lps = build_lps(pattern) matches: List[int] = [] i = j = 0 while i < n: if text[i] == pattern[j]: i += 1 j += 1 if j == m: matches.append(i - j) j = lps[j - 1] elif i < n and text[i] != pattern[j]: if j != 0: j = lps[j - 1] else: i += 1 return matches

What Problems It Solves

Substring search with a worst-case guarantee. Any situation where you need to find all occurrences of a pattern in a text and you can't accept O(nm) worst-case behavior. grep-style tools, text editors' find-and-replace on large files, stream processing where data arrives character by character.

Small-alphabet search. DNA sequencing involves four characters (A, C, G, T). Binary protocol parsing involves two (0, 1). For these alphabets, Boyer-Moore's bad-character rule provides almost no skip information, and KMP's constant-time mismatch handling is competitive.

Repeated pattern detection. The lps[m-1] periodicity trick tells you in O(m) whether a string is a repeated substring. LeetCode 459. Also used in string compression: if a string of length n has a period of k (and k divides n), you can represent it as the period repeated n/k times.

String rotation checking. t is a rotation of s if and only if t appears in s + s. Run KMP with pattern t on text s + s. O(n) instead of the O(n²) naive approach.

Longest palindromic prefix. Build a string s + "#" + reverse(s), compute its LPS array. lps[2n] gives the length of the longest palindromic prefix. Used in LeetCode 214 ("Shortest Palindrome").

Streaming pattern matching. Because KMP processes the text left to right and only needs the LPS array (precomputed from the pattern), you can feed text characters one at a time without buffering. The KMP automaton state is just the current value of j. Useful for network intrusion detection systems, log stream analysis, and any context where the text isn't fully available upfront.

When Does the KMP Algorithm Apply?

The primary signal: you need to find a fixed pattern in a larger text, and naive O(nm) is too slow, or the interviewer asks for O(n + m).

Secondary signals:

  • The alphabet is small (binary, DNA, hex). For English text with long patterns, Boyer-Moore usually wins.
  • The problem asks about overlapping occurrences. KMP handles these correctly by default via the lps[m-1] restart.
  • The pattern itself has interesting structural properties (periodicity, palindromic prefix). The LPS array you build for searching doubles as the answer to these structure questions.
  • The problem mentions "repeated substring" or "string rotation" or "palindromic prefix."

Worked Example 1: LeetCode 28, Find the Index of the First Occurrence

Problem: given haystack and needle, return the index of the first occurrence of needle in haystack, or -1 if it doesn't exist.

This is the strStr problem directly. haystack.find(needle) in Python uses Boyer-Moore-Horspool internally, so it's O(nm) worst case. KMP guarantees O(n + m).

def strStr(haystack: str, needle: str) -> int: matches = kmp_search(haystack, needle) return matches[0] if matches else -1

The LPS table for "needle": check needle[5]='e' vs needle[0]='n', mismatch, so lps[5]=0. The full table is [0, 0, 0, 0, 0, 0]. No internal repetition, so KMP behaves close to naive on this particular pattern. For needle = "ABABAB" the LPS is [0, 0, 1, 2, 3, 4], and the search skips significant work.

Worked Example 2: LeetCode 459, Repeated Substring Pattern

Problem: given string s, return true if it can be constructed by repeating a substring multiple times.

The naive approach tries all possible period lengths from 1 to m/2. O(m²) worst case.

Build the LPS array for s. If lps[m-1] > 0 and m % (m - lps[m-1]) == 0, the answer is true.

Why this works: lps[m-1] = k means the first k characters equal the last k characters. The candidate period is p = m - k. If p divides m, then starting from the period, every subsequent block of length p repeats the same characters (this can be proved by induction on the prefix structure). If p does not divide m, the overlap doesn't tile the string evenly.

def repeatedSubstringPattern(s: str) -> bool: m = len(s) lps = build_lps(s) period = m - lps[m - 1] return lps[m - 1] > 0 and m % period == 0

O(m) time. O(m) space. One pass.

Worked Example 3: String Rotation Check

Problem: is t a rotation of s?

A rotation of s = "ABCDE" is "CDEAB". Notice "CDEAB" appears in "ABCDEABCDE" (which is s + s). So t is a rotation of s iff len(t) == len(s) and t is a substring of s + s.

def isRotation(s: str, t: str) -> bool: if len(s) != len(t): return False return bool(kmp_search(s + s, t))

KMP gives this in O(n). The naive t in (s + s) in Python is Boyer-Moore-Horspool, fast in practice but not provably linear. KMP is.

Three Bugs That Show Up Every Time

You will write at least one of these. Two is par. All three at once means you have truly lived.

Bug 1: resetting j to 0 after a full match. If you write j = 0 instead of j = lps[j - 1] after recording a match, you miss overlapping occurrences. For pattern "AA" in text "AAA", the correct answer is matches at positions 0 and 1. Resetting j = 0 gives only position 0.

Bug 2: incrementing i when falling back. In the LPS construction, when length != 0 and there's a mismatch, the correct move is length = lps[length - 1] without incrementing i. Incrementing i here skips a character in the pattern and produces wrong LPS values.

Bug 3: confusing "proper prefix" with "prefix." lps[0] is always 0. A string of length 1 has no proper prefix (the prefix "A" of "A" is not proper because it equals the whole string). If your LPS table starts at index 0 with a nonzero value, something went wrong.

What to Remember

  • KMP builds an LPS (failure) table from the pattern in O(m), then scans the text in O(n), for O(n + m) total.
  • lps[i] = length of the longest proper prefix of pattern[0..i] that is also its suffix.
  • The text pointer i never goes backward. j (pattern pointer) can fall back via lps[j-1], but its total decrements are bounded by its total increments, both at most n.
  • After a full match, set j = lps[m-1] (not 0) to handle overlapping matches.
  • The LPS construction inner loop is O(m) amortized: length increases at most m times, decreases at most m times.
  • lps[m-1] encodes the pattern's periodic structure: period = m - lps[m-1].
  • The sentinel trick (pattern + "#" + text) unifies construction and search into one LPS call.
  • For large alphabets and long patterns, Boyer-Moore is faster in practice. KMP wins on small alphabets and streaming contexts.

Practicing this on paper is one thing. Actually narrating why the LPS table is O(m) amortized under interview pressure, while an interviewer asks follow-up questions, is different. SpaceComplexity runs voice-based DSA mock interviews with rubric-based feedback, so you can practice exactly that kind of explanation before it counts.

The suffix array covers a related approach to substring search that trades preprocessing time for faster multi-pattern queries. The string internals and interview patterns post covers the broader family of string problems you'll see in interviews, including where KMP fits in the toolbox.

Further Reading