Rolling Hash Algorithm: The O(1) Window Trick for String Search

- Rolling hash avoids O(n·m) string search by computing a polynomial hash for the first window, then sliding it in O(1) per step
- Horner's method builds the initial hash left-to-right without computing large intermediate powers
- The slide formula subtracts the outgoing character's contribution, shifts remaining terms up one power, then appends the new character
- Add MOD before subtracting to prevent negative remainders in Java, Go, C, and any language where
%is remainder-not-modulus - Always verify after a hash match; skipping the substring comparison turns a probabilistic trick into a correctness bug
- Binary search on length combined with rolling hash is the intended solution for LeetCode 1044 (Longest Duplicate Substring) and 718 (Maximum Length of Repeated Subarray)
- Double hashing with two independent (B, P) pairs defeats adversarial collision attacks by reducing false-positive probability to ~1/(P₁·P₂)
You have a string. You want to find another string inside it. Simple enough. Except the naive approach slides a length-m pattern across n characters and compares all m characters at every position. That's O(n*m). Perfectly fine for tiny inputs. Quietly catastrophic the moment your pattern is 1000 characters long and you're searching a log file nobody warned you would be 400MB.
The rolling hash algorithm fixes this by computing a hash for the initial window in O(m), then updating it in O(1) as the window slides. You're not re-hashing from scratch each time. You're doing arithmetic on the previous hash. The result is an O(n+m) algorithm that handles substring search, duplicate detection, and binary-search-on-length problems that show up regularly in interviews.
Recomputing the Hash Every Step Is Just O(n*m) in Disguise
Imagine you're looking for "ab" in "cabd". At position 0 you hash "ca". At position 1 you hash "ab". At position 2 you hash "bd". Each hash computation reads every character in the window. If your pattern is 100 characters long, you're doing 100 comparisons per step even before you find anything.
The observation that makes rolling hash work is that consecutive windows share m-1 characters. When you slide one position right, you drop one character from the left and add one on the right. The hash should reflect that incrementally, not require a full recompute.
Think of it like a conveyor belt. The belt moves forward one box. You don't dismantle the entire belt and rebuild it from scratch just because one new box arrived at the end.
The Polynomial Hash Formula
Rolling hash uses a polynomial evaluated at a base B, taken mod a large prime P:
H(s) = s[0]*B^(m-1) + s[1]*B^(m-2) + ... + s[m-1] mod P
Each character contributes its value multiplied by a decreasing power of B. The highest power goes to the leftmost character, so "ab" and "ba" hash differently. Character values are c - 'a' + 1 ('a'=1, 'b'=2, and so on), which avoids zero-value characters collapsing polynomial terms into nothing.
The formula is exactly Horner's method: start with 0, multiply by B, add the next character value, repeat. This lets you build the initial hash in a single left-to-right pass without computing enormous intermediate powers.
The Slide Step Costs Three Operations. That's It.
When the window moves right by one position, you drop s[left] and add s[right+1]. In polynomial terms:
H_new = (H_old - s[left] * B^(m-1)) * B + s[right] mod P
Break it down: subtract the contribution of the character you're removing (it had the highest power), multiply by B to shift the remaining characters up one power, then add the new character at the lowest power (B^0 = 1).
You precompute B^(m-1) once before the loop, so each slide is three arithmetic operations regardless of window size. Computing it fresh inside the slide loop reintroduces O(m) per step and erases the entire gain. In Python: pow(BASE, m-1, MOD). In other languages, a simple loop before the main loop handles it. Never put it inside.
A Walk-Through You Can Actually Follow
Pattern: "ab" (m=2), Text: "cabd". Use B=31, P=10^9+7, character value = c - 'a' + 1.
Character values: 'a'=1, 'b'=2, 'c'=3, 'd'=4.
Pattern hash:
H("ab") = 1*31^1 + 2*31^0 = 31 + 2 = 33
Initial window "ca":
H("ca") = 3*31 + 1 = 94
94 != 33. No match. Slide to "ab":
H_new = (94 - 3*31) * 31 + 2 = (94 - 93) * 31 + 2 = 1*31 + 2 = 33
33 == 33, potential match. Verify by comparing the substring: text[1..2] == "ab". Confirmed at index 1.
The slide step dropped 'c' and added 'b' without touching 'a'. The 'a' got a power promotion (B^0 to B^1) while 'c' got quietly let go. Corporate restructuring, but for polynomials.
Why Modular Arithmetic? (Because 31^99 Does Not Fit Anywhere)
B^(m-1) grows exponentially. For m=100 and B=31, that's 31^99, a number with roughly 147 digits. No integer type holds that. You take every computation mod a large prime P to keep values in range.
A large prime like 10^9+7 works well because it fits in a 64-bit integer when you multiply two values (since (10^9+7)^2 < 2^63), and it's large enough to make accidental collisions rare.
The trap is subtraction: (H_old - s[left] * power) % P goes negative in Java, Go, and C, where % is remainder with the dividend's sign, not true modulus. Python handles negative modulo correctly. Java, Go, and C do not. The fix:
H_new = ((H_old - s[left] * power % P + P) % P * B + s[right]) % P
That + P before the outer % P ensures the result stays non-negative. This is the bug that bites everyone the first time they port rolling hash from Python to Java. The code runs. The tests pass on the happy path. Then one test with a leading character of high value fails, and you spend twenty minutes confused.
Hash Collisions: The Algorithm Is Occasionally Lying
Two different strings can produce the same hash. With P=10^9+7, the per-comparison collision probability is about 1 in 10^9. Over n comparisons, the expected number of false positives is n/P. Small but nonzero.
Whenever a hash match occurs, verify with a direct substring comparison. This is not optional. It preserves correctness without hurting average-case complexity, because false positives are rare and verification only fires on genuine matches in practice.
For adversarial inputs, where someone knows your hash parameters and deliberately constructs collisions, use double hashing: two independent (B, P) pairs, only accept a match when both agree. Collision probability drops to roughly 1 in P1*P2, which is effectively zero.
O(n+m) Time, O(1) Space. Say It With Confidence.
The initial hash computation takes O(m). Each of the n-m+1 slide steps takes O(1). Verification on a hash match costs O(m) worst case, but fires at most O(n/m) times on average, amortizing to O(n).
Total: O(n+m) time, O(1) space. No extra arrays, no preprocessing tables. Just a handful of integers.
Compare to KMP, which is also O(n+m) time but requires O(m) space for the failure function, or Rabin-Karp (which is essentially this algorithm named after its inventors). For a comparison of string search strategies, /blog/kmp-vs-rabin-karp walks through when each one wins. For a deeper look at Rabin-Karp specifically, see /blog/rabin-karp-algorithm.
One Structure, Every Language
Each implementation below finds the first occurrence of pattern in text and returns its index (or -1 if not found). BASE=31, MOD=10^9+7, character values are c - 'a' + 1.
Python 3
def rolling_hash_search(text: str, pattern: str) -> int: BASE = 31 MOD = 10**9 + 7 n, m = len(text), len(pattern) if m > n: return -1 power = 1 for _ in range(m - 1): power = (power * BASE) % MOD def char_val(c: str) -> int: return ord(c) - ord('a') + 1 pattern_hash = 0 window_hash = 0 for i in range(m): pattern_hash = (pattern_hash * BASE + char_val(pattern[i])) % MOD window_hash = (window_hash * BASE + char_val(text[i])) % MOD if window_hash == pattern_hash and text[:m] == pattern: return 0 for i in range(1, n - m + 1): window_hash = ( (window_hash - char_val(text[i - 1]) * power % MOD + MOD) % MOD * BASE + char_val(text[i + m - 1]) ) % MOD if window_hash == pattern_hash and text[i:i + m] == pattern: return i return -1
Python 2
def rolling_hash_search(text, pattern): BASE = 31 MOD = 10**9 + 7 n, m = len(text), len(pattern) if m > n: return -1 power = 1 for _ in range(m - 1): power = (power * BASE) % MOD def char_val(c): return ord(c) - ord('a') + 1 pattern_hash = 0 window_hash = 0 for i in range(m): pattern_hash = (pattern_hash * BASE + char_val(pattern[i])) % MOD window_hash = (window_hash * BASE + char_val(text[i])) % MOD if window_hash == pattern_hash and text[:m] == pattern: return 0 for i in range(1, n - m + 1): window_hash = ( (window_hash - char_val(text[i - 1]) * power % MOD + MOD) % MOD * BASE + char_val(text[i + m - 1]) ) % MOD if window_hash == pattern_hash and text[i:i + m] == pattern: return i return -1
JavaScript (BigInt to avoid float precision issues)
function rollingHashSearch(text, pattern) { const BASE = 31n; const MOD = 1_000_000_007n; const n = text.length; const m = pattern.length; if (m > n) return -1; const charVal = (c) => BigInt(c.charCodeAt(0) - 'a'.charCodeAt(0) + 1); let power = 1n; for (let i = 0; i < m - 1; i++) { power = (power * BASE) % MOD; } let patternHash = 0n; let windowHash = 0n; for (let i = 0; i < m; i++) { patternHash = (patternHash * BASE + charVal(pattern[i])) % MOD; windowHash = (windowHash * BASE + charVal(text[i])) % MOD; } if (windowHash === patternHash && text.slice(0, m) === pattern) return 0; for (let i = 1; i <= n - m; i++) { windowHash = ( ((windowHash - charVal(text[i - 1]) * power % MOD + MOD) % MOD) * BASE + charVal(text[i + m - 1]) ) % MOD; if (windowHash === patternHash && text.slice(i, i + m) === pattern) return i; } return -1; }
TypeScript (BigInt to avoid float precision issues)
function rollingHashSearch(text: string, pattern: string): number { const BASE = 31n; const MOD = 1_000_000_007n; const n = text.length; const m = pattern.length; if (m > n) return -1; const charVal = (c: string): bigint => BigInt(c.charCodeAt(0) - 'a'.charCodeAt(0) + 1); let power = 1n; for (let i = 0; i < m - 1; i++) { power = (power * BASE) % MOD; } let patternHash = 0n; let windowHash = 0n; for (let i = 0; i < m; i++) { patternHash = (patternHash * BASE + charVal(pattern[i])) % MOD; windowHash = (windowHash * BASE + charVal(text[i])) % MOD; } if (windowHash === patternHash && text.slice(0, m) === pattern) return 0; for (let i = 1; i <= n - m; i++) { windowHash = ( ((windowHash - charVal(text[i - 1]) * power % MOD + MOD) % MOD) * BASE + charVal(text[i + m - 1]) ) % MOD; if (windowHash === patternHash && text.slice(i, i + m) === pattern) return i; } return -1; }
Java
public class RollingHash { public static int rollingHashSearch(String text, String pattern) { final long BASE = 31; final long MOD = 1_000_000_007L; int n = text.length(); int m = pattern.length(); if (m > n) return -1; long power = 1; for (int i = 0; i < m - 1; i++) { power = (power * BASE) % MOD; } long patternHash = 0; long windowHash = 0; for (int i = 0; i < m; i++) { patternHash = (patternHash * BASE + (pattern.charAt(i) - 'a' + 1)) % MOD; windowHash = (windowHash * BASE + (text.charAt(i) - 'a' + 1)) % MOD; } if (windowHash == patternHash && text.substring(0, m).equals(pattern)) return 0; for (int i = 1; i <= n - m; i++) { windowHash = ( ((windowHash - (text.charAt(i - 1) - 'a' + 1) * power % MOD + MOD) % MOD) * BASE + (text.charAt(i + m - 1) - 'a' + 1) ) % MOD; if (windowHash == patternHash && text.substring(i, i + m).equals(pattern)) return i; } return -1; } }
C++
#include <string> int rollingHashSearch(const std::string& text, const std::string& pattern) { const long long BASE = 31; const long long MOD = 1'000'000'007LL; int n = text.size(); int m = pattern.size(); if (m > n) return -1; long long power = 1; for (int i = 0; i < m - 1; i++) { power = (power * BASE) % MOD; } long long pattern_hash = 0; long long window_hash = 0; for (int i = 0; i < m; i++) { pattern_hash = (pattern_hash * BASE + (pattern[i] - 'a' + 1)) % MOD; window_hash = (window_hash * BASE + (text[i] - 'a' + 1)) % MOD; } if (window_hash == pattern_hash && text.substr(0, m) == pattern) return 0; for (int i = 1; i <= n - m; i++) { window_hash = ( ((window_hash - (long long)(text[i - 1] - 'a' + 1) * power % MOD + MOD) % MOD) * BASE + (text[i + m - 1] - 'a' + 1) ) % MOD; if (window_hash == pattern_hash && text.substr(i, m) == pattern) return i; } return -1; }
C
#include <string.h> int rollingHashSearch(const char* text, const char* pattern) { const long long BASE = 31; const long long MOD = 1000000007LL; int n = (int)strlen(text); int m = (int)strlen(pattern); if (m > n) return -1; long long power = 1; for (int i = 0; i < m - 1; i++) { power = (power * BASE) % MOD; } long long pattern_hash = 0; long long window_hash = 0; for (int i = 0; i < m; i++) { pattern_hash = (pattern_hash * BASE + (pattern[i] - 'a' + 1)) % MOD; window_hash = (window_hash * BASE + (text[i] - 'a' + 1)) % MOD; } if (window_hash == pattern_hash && strncmp(text, pattern, m) == 0) return 0; for (int i = 1; i <= n - m; i++) { window_hash = ( ((window_hash - (long long)(text[i - 1] - 'a' + 1) * power % MOD + MOD) % MOD) * BASE + (text[i + m - 1] - 'a' + 1) ) % MOD; if (window_hash == pattern_hash && strncmp(text + i, pattern, m) == 0) return i; } return -1; }
Go
func rollingHashSearch(text, pattern string) int { const BASE = 31 const MOD = 1_000_000_007 n, m := len(text), len(pattern) if m > n { return -1 } power := int64(1) for i := 0; i < m-1; i++ { power = (power * BASE) % MOD } charVal := func(c byte) int64 { return int64(c-'a') + 1 } patternHash := int64(0) windowHash := int64(0) for i := 0; i < m; i++ { patternHash = (patternHash*BASE + charVal(pattern[i])) % MOD windowHash = (windowHash*BASE + charVal(text[i])) % MOD } if windowHash == patternHash && text[:m] == pattern { return 0 } for i := 1; i <= n-m; i++ { windowHash = ( ((windowHash-charVal(text[i-1])*power%MOD+MOD)%MOD)*BASE+ charVal(text[i+m-1]), ) % MOD if windowHash == patternHash && text[i:i+m] == pattern { return i } } return -1 }
Rust
fn rolling_hash_search(text: &str, pattern: &str) -> Option<usize> { const BASE: u64 = 31; const MOD: u64 = 1_000_000_007; let text_bytes = text.as_bytes(); let pattern_bytes = pattern.as_bytes(); let n = text_bytes.len(); let m = pattern_bytes.len(); if m > n { return None; } let mut power: u64 = 1; for _ in 0..m - 1 { power = (power * BASE) % MOD; } let char_val = |c: u8| -> u64 { (c - b'a' + 1) as u64 }; let mut pattern_hash: u64 = 0; let mut window_hash: u64 = 0; for i in 0..m { pattern_hash = (pattern_hash * BASE + char_val(pattern_bytes[i])) % MOD; window_hash = (window_hash * BASE + char_val(text_bytes[i])) % MOD; } if window_hash == pattern_hash && &text[..m] == pattern { return Some(0); } for i in 1..=n - m { let remove = char_val(text_bytes[i - 1]) * power % MOD; window_hash = ((window_hash + MOD - remove) % MOD * BASE + char_val(text_bytes[i + m - 1])) % MOD; if window_hash == pattern_hash && &text[i..i + m] == pattern { return Some(i); } } None }
Swift
func rollingHashSearch(_ text: String, _ pattern: String) -> Int? { let BASE: Int64 = 31 let MOD: Int64 = 1_000_000_007 let textArray = Array(text.utf8) let patternArray = Array(pattern.utf8) let n = textArray.count let m = patternArray.count if m > n { return nil } var power: Int64 = 1 for _ in 0..<(m - 1) { power = (power * BASE) % MOD } let charVal: (UInt8) -> Int64 = { Int64($0 - UInt8(ascii: "a") + 1) } var patternHash: Int64 = 0 var windowHash: Int64 = 0 for i in 0..<m { patternHash = (patternHash * BASE + charVal(patternArray[i])) % MOD windowHash = (windowHash * BASE + charVal(textArray[i])) % MOD } if windowHash == patternHash && Array(text.prefix(m)) == Array(pattern) { return 0 } for i in 1...(n - m) { windowHash = ( ((windowHash - charVal(textArray[i - 1]) * power % MOD + MOD) % MOD) * BASE + charVal(textArray[i + m - 1]) ) % MOD if windowHash == patternHash { let start = text.index(text.startIndex, offsetBy: i) let end = text.index(start, offsetBy: m) if text[start..<end] == pattern { return i } } } return nil }
Kotlin
fun rollingHashSearch(text: String, pattern: String): Int { val BASE = 31L val MOD = 1_000_000_007L val n = text.length val m = pattern.length if (m > n) return -1 var power = 1L repeat(m - 1) { power = (power * BASE) % MOD } val charVal: (Char) -> Long = { (it - 'a' + 1).toLong() } var patternHash = 0L var windowHash = 0L for (i in 0 until m) { patternHash = (patternHash * BASE + charVal(pattern[i])) % MOD windowHash = (windowHash * BASE + charVal(text[i])) % MOD } if (windowHash == patternHash && text.substring(0, m) == pattern) return 0 for (i in 1..n - m) { windowHash = ( ((windowHash - charVal(text[i - 1]) * power % MOD + MOD) % MOD) * BASE + charVal(text[i + m - 1]) ) % MOD if (windowHash == patternHash && text.substring(i, i + m) == pattern) return i } return -1 }
C#
public static class RollingHash { public static int RollingHashSearch(string text, string pattern) { const long Base = 31; const long Mod = 1_000_000_007L; int n = text.Length; int m = pattern.Length; if (m > n) return -1; long power = 1; for (int i = 0; i < m - 1; i++) { power = (power * Base) % Mod; } long charVal(char c) => c - 'a' + 1; long patternHash = 0; long windowHash = 0; for (int i = 0; i < m; i++) { patternHash = (patternHash * Base + charVal(pattern[i])) % Mod; windowHash = (windowHash * Base + charVal(text[i])) % Mod; } if (windowHash == patternHash && text[..m] == pattern) return 0; for (int i = 1; i <= n - m; i++) { windowHash = ( ((windowHash - charVal(text[i - 1]) * power % Mod + Mod) % Mod) * Base + charVal(text[i + m - 1]) ) % Mod; if (windowHash == patternHash && text.Substring(i, m) == pattern) return i; } return -1; } }
Ruby
def rolling_hash_search(text, pattern) base = 31 mod = 10**9 + 7 n = text.length m = pattern.length return -1 if m > n power = 1 (m - 1).times { power = (power * base) % mod } char_val = ->(c) { c.ord - 'a'.ord + 1 } pattern_hash = 0 window_hash = 0 m.times do |i| pattern_hash = (pattern_hash * base + char_val.call(pattern[i])) % mod window_hash = (window_hash * base + char_val.call(text[i])) % mod end return 0 if window_hash == pattern_hash && text[0, m] == pattern (1..n - m).each do |i| window_hash = ( ((window_hash - char_val.call(text[i - 1]) * power % mod + mod) % mod) * base + char_val.call(text[i + m - 1]) ) % mod return i if window_hash == pattern_hash && text[i, m] == pattern end -1 end
PHP
function rollingHashSearch(string $text, string $pattern): int { $base = 31; $mod = 1000000007; $n = strlen($text); $m = strlen($pattern); if ($m > $n) return -1; $charVal = fn(string $c): int => ord($c) - ord('a') + 1; $power = 1; for ($i = 0; $i < $m - 1; $i++) { $power = ($power * $base) % $mod; } $patternHash = 0; $windowHash = 0; for ($i = 0; $i < $m; $i++) { $patternHash = ($patternHash * $base + $charVal($pattern[$i])) % $mod; $windowHash = ($windowHash * $base + $charVal($text[$i])) % $mod; } if ($windowHash === $patternHash && substr($text, 0, $m) === $pattern) return 0; for ($i = 1; $i <= $n - $m; $i++) { $windowHash = ( (($windowHash - $charVal($text[$i - 1]) * $power % $mod + $mod) % $mod) * $base + $charVal($text[$i + $m - 1]) ) % $mod; if ($windowHash === $patternHash && substr($text, $i, $m) === $pattern) return $i; } return -1; }
Four Problems Where This Pattern Will Save You
Four LeetCode problems map cleanly to rolling hash. Recognizing the pattern is most of the work.
LeetCode 28 (Find the Index of the First Occurrence in a String) is the direct application: find a pattern in a string. Rolling hash is one valid approach, though KMP (see /blog/kmp-algorithm) is the canonical string-search algorithm here.
LeetCode 187 (Repeated DNA Sequences) asks for all 10-character substrings that appear more than once. Fixed-length window, checking for duplicates. Roll a hash across the string and store seen hashes in a set.
LeetCode 718 (Maximum Length of Repeated Subarray) asks for the longest common subarray between two arrays. The trick: binary search on length, then use rolling hash to check whether any subarray of that length appears in both arrays. O(n log n) total.
LeetCode 1044 (Longest Duplicate Substring) is the classic combo: binary search on the duplicate substring length, rolling hash to check whether any window appears twice. This is where the technique really pays off. A naive O(n^2) approach times out on every non-trivial input.
The pattern to recognize: "find repeated or duplicate substrings" combined with binary search on length. When you see that pair, rolling hash is almost certainly the intended solution. The sliding window pattern handles window movement; rolling hash handles efficient comparison.
The Bugs That Get Everyone, Exactly Once
Forgetting the +MOD on subtraction. In Java, Go, and C, (-3) % 7 is -3, not 4. Add MOD before taking % MOD any time you subtract. Every time. Without exception. This one has caused more "works on my machine" bugs than most.
Using a base that's too small or not coprime with MOD. BASE=1 reduces every hash to the sum of character values mod P, which collides on every anagram. BASE=31 is standard for lowercase letters. BASE=131 or BASE=137 for full ASCII.
Off-by-one in the power precomputation. You need B^(m-1), not B^m. The leftmost of m characters has exponent m-1. Run the loop m-1 times, not m. Running it m times gives you an extra factor of B and produces wrong results with zero helpful error messages.
Using float instead of 64-bit integers. JavaScript's Number type loses precision above 2^53. That's why the JS and TS implementations use BigInt. In other languages, verify you're using int64 or long throughout, especially in the multiplication before the mod.
Skipping verification after a hash match. Omitting the substring comparison doesn't make the algorithm faster. It makes it wrong. False positives are rare but they exist. Verification is O(m) amortized to negligible cost. Do it.
If you want to practice applying rolling hash under real interview pressure, SpaceComplexity runs voice mock interviews where you work through problems like LeetCode 1044 and get rubric-based feedback on your explanation as well as your code.