Why Counting Problems Use mod 1000000007: Modular Arithmetic Explained

You're computing the number of unique paths in a 100x100 grid. Your DP table fills perfectly. You tested on a 3x3. You traced through the transitions on paper. Then you run it on the actual input and get back -1234567890.

Not a wrong algorithm. Not a logic bug. Just integer overflow, quietly wrecking your answer. And the fix is one number you'll see in almost every counting problem's constraints: % 1000000007.

Your Answer Has 58 Digits

Counting problems produce enormous results. The number of unique paths in a 100x100 grid is C(198, 99), which has 58 digits. Fibonacci(100) is 354,224,848,179,261,915,075. Neither fits in a 32-bit integer (max: roughly 2.1 billion). The second barely squeezes into 64-bit (max: roughly 9.2 × 10^18). Push to Fibonacci(200), or count how many ways you can arrange a 20-character string, and you've left int64 so far behind it can't even see you anymore.

The problem isn't that your code is slow. The answer doesn't physically fit in any integer type your language provides.

Two options exist: use arbitrary-precision integers (Python does this automatically, which feels great right up until you hit the time limit on n=10^6), or accept that you only need the answer modulo some prime. Interview problems almost always choose option two, and they tell you upfront in the constraints. That "mod 10^9+7" in the problem statement isn't flavor text.

What actually happens when you overflow is that the number wraps around silently. A 64-bit signed integer overflows past 2^63-1 and becomes a large negative number. The computer did exactly what you told it to do. It multiplied two numbers together. It just didn't stop at the cliff edge.

I hate myself / Wow! I hate this more

You tested on a 3x3 grid. It worked perfectly. The 100x100 input had 58-digit opinions about that.

The Property That Makes Mod Actually Work

A modulus lets you carry a reduced version of your answer through every step of the computation, without ever materializing the full number.

The key property is that mod distributes over addition and multiplication:

(a + b) % M  =  ((a % M) + (b % M)) % M
(a * b) % M  =  ((a % M) * (b % M)) % M

This means you can apply % M at every intermediate step instead of at the end. You're not losing information about the final result mod M. You're just never storing the intermediate billion-digit number.

Fibonacci is the cleanest demonstration:

M = 1_000_000_007

def fib_mod(n):
    a, b = 0, 1
    for _ in range(n):
        a, b = b, (a + b) % M
    return a

(a + b) % M never overflows because a and b are always less than M, and M + M fits comfortably in 32-bit. The result is the exact correct answer, not an approximation. You're computing the real Fibonacci number, just in a modular number system where all arithmetic wraps at M.

The same logic applies to DP transitions. If dp[i] = dp[i-1] + dp[i-2], you write dp[i] = (dp[i-1] + dp[i-2]) % M. The modulo rides along at each step, keeping all values under M, and the final cell holds the correct answer mod M.

Why 1000000007 Specifically?

It's not a lucky number someone picked. Two concrete properties make it the standard.

1000000007 is prime. Primes have a property composites don't: every non-zero element has a modular inverse. That matters for division under mod. A composite modulus breaks this guarantee, which is why you'll never see % 1000000000 in a problem statement.

It also keeps multiplication safe in 64-bit arithmetic. When you compute (a % M) * (b % M), both factors are at most M - 1, which is roughly 10^9. Their product is at most roughly 10^18. Since int64 holds up to 9.2 × 10^18, the multiplication doesn't overflow before the final % M. Java's long and C++'s long long handle this without any extra work or casting.

You'll also see 998244353 in some problems. Same two properties, but with additional structure that makes polynomial multiplication (the Number Theoretic Transform) fast. For general interview problems, 1e9+7 is the one to have memorized.

The Subtraction Trap Nobody Warns You About

Addition and multiplication carry through cleanly. Subtraction hides a sharp edge that catches almost everyone eventually.

a = 5
b = 10
M = 1_000_000_007

# Looks correct. Passes most test cases. Then silently breaks.
result = (a - b) % M  # Python: 999999002 ✓   Java/C++/Go: -5 ✗

# Correct in every language
result = ((a - b) % M + M) % M

Python's % always returns a non-negative result. (-5) % M in Python gives M - 5, which is mathematically correct. Java, C++, and Go return a result with the sign of the dividend. (-5) % M gives -5, which flows into the rest of your DP and corrupts every value downstream.

Always write ((a - b) % M + M) % M for subtraction. The extra + M costs nothing and saves you from the test case where b > a.

Division is a separate problem. You cannot compute (a / b) % M as ((a % M) / (b % M)) % M. Division doesn't distribute the same way addition and multiplication do. To divide by b under mod, you need the modular inverse of b.

For a prime modulus M, the modular inverse of b is b^(M-2) % M. This comes from Fermat's little theorem: for prime p and integer a not divisible by p, a^(p-1) ≡ 1 (mod p). So a^(p-2) * a ≡ 1 (mod p), making a^(p-2) the multiplicative inverse of a.

def mod_inverse(b, M):
    return pow(b, M - 2, M)  # 3-arg pow uses fast exponentiation internally

# Division: (a / b) % M = (a * mod_inverse(b, M)) % M

Computing the inverse takes O(log M) time via binary exponentiation. Fast, but not free when you're calling it in a loop for each combination query.

Thoughts on concurrency: every solution creates new problems

"I know, I'll just add % M everywhere." And now the subtraction returns -5.

`% M` Doesn't Change Your Complexity

When a problem says "return the answer mod 10^9+7," the conceptual answer might have hundreds of digits. The work to produce it grows with your DP state space, not with the magnitude of the answer.

Adding % M to each step is O(1). It doesn't change your solution's overall complexity at all. A grid path count that runs O(nm) stays O(nm). The modulo is one arithmetic operation per cell.

Where complexity actually matters is precomputing combinations for large n:

M = 1_000_000_007

def precompute(n):
    fact = [1] * (n + 1)
    for i in range(1, n + 1):
        fact[i] = fact[i-1] * i % M

    inv_fact = [1] * (n + 1)
    inv_fact[n] = pow(fact[n], M - 2, M)
    for i in range(n - 1, -1, -1):
        inv_fact[i] = inv_fact[i+1] * (i+1) % M

    return fact, inv_fact

def nCr(n, r, fact, inv_fact):
    if r < 0 or r > n:
        return 0
    return fact[n] * inv_fact[r] % M * inv_fact[n-r] % M

Precomputing factorials up to n is O(n). Each combination query after that is O(1). Without precomputation, each query costs O(n) for the factorial or O(log M) for the inverse. Calling either inside a loop turns a fast solution slow in a hurry.

What the Constraint Actually Tells You

The phrase to watch for: "return the answer modulo 10^9+7." Once you see it, the real answer is too large to store and you need % M throughout every DP transition. The dynamic programming framework applies to all the problem types below. The mod just rides along for the count.

If the problem counts something and has large input bounds, the count grows faster than O(n). That's when you need the mod.

Common types:

Counting paths in grids. LeetCode 62 (Unique Paths) has small enough inputs that overflow doesn't hit, but harder variants do.
Fibonacci and linear recurrences with large n. "Find the n-th Fibonacci where n can be up to 10^9" forces matrix exponentiation with modular arithmetic.
Counting valid strings. "How many binary strings of length n have no two consecutive 1s?" grows exponentially fast.
Combinations in DP. Partition problems, subset counting, knapsack variants where the answer is the count, not the value.
Permutation counting. Any problem asking how many orderings satisfy some property.

The modular arithmetic primer covers the number theory underneath if you want the full picture.

Two Bugs That Look Like One

Correctly applying modular arithmetic shows you understood the full problem. Overflow on large inputs shows you only tested small ones.

First failure mode: applying mod in some places but not others. Add % M to the DP transition but forget an intermediate multiplication, and overflow still happens. The result looks plausible on medium inputs and explodes on large ones. It's the kind of bug that passes 42 test cases and fails on 43.

Second: forgetting the subtraction fix. (dp[i] - dp[j]) % M looks right and passes most tests. Then one case has dp[j] > dp[i] and you get a large negative number flowing through the rest of the table.

If your counting DP produces wrong answers only on large inputs, check two things: % M at every single step, and ((a - b) % M + M) % M for every subtraction. Those two checks catch most overflow bugs. The third thing worth auditing is any division in your formula. If you're computing nCr or any ratio, you need modular inverse, not plain division.

Practice the mechanical application and the pattern recognition at SpaceComplexity. Explaining why you're adding a modulus while coding under time pressure is a different skill from knowing the rule on paper.