What Is the Fisher-Yates Shuffle? Why Your First Attempt Is Biased

June 5, 20268 min read
dsaalgorithmsinterview-prepdata-structures
What Is the Fisher-Yates Shuffle? Why Your First Attempt Is Biased
TL;DR
  • Naive shuffle draws from the full range at every step, producing n^n paths that don't divide evenly into n! permutations — some arrangements appear systematically more often
  • Fisher-Yates shrinks the selection range at each step, producing exactly n! paths for n! permutations and guaranteeing uniform probability
  • The only code difference: draw j from [0, i] not [0, n-1] — one character separates biased from correct
  • Sattolo's algorithm uses [0, i-1] and produces derangements where no element stays in place; looks like Fisher-Yates but isn't
  • O(n) time, O(1) space: each element touched exactly once, shuffled in-place with no auxiliary array
  • LeetCode 384 asks for this directly; interviewers follow up with the n^n vs n! counting argument to test understanding, not just memorization

You have an array. Shuffle it. Uniformly random, every permutation equally likely. Go ahead, try it before reading further.

Most people write something that looks correct. They feel good about it. They ship it. It's wrong. The result is biased: some arrangements appear far more often than they should, others almost never. The bug is invisible until you run an experiment. The fix is one character.

Fisher-Yates is the correct algorithm. O(n) time, O(1) space, every permutation with exactly equal probability. Understanding why it works, and why the naive version fails, is what interviewers actually test.

The Shuffle Most People Write First

Intuitive approach: for each position, pick a random index from anywhere in the array and swap.

import random def naive_shuffle(arr): n = len(arr) for i in range(n): j = random.randint(0, n - 1) # random from full range arr[i], arr[j] = arr[j], arr[i] return arr

This looks completely reasonable. You'd be proud of this code. You'd write it in an interview with confidence. You'd be wrong.

For a 3-element array [A, B, C], there are exactly 3! = 6 possible permutations. A fair shuffle assigns each one probability 1/6, or about 16.67%.

But this algorithm makes 3 random choices, each from 3 positions. That's 3³ = 27 total execution paths. And 27 does not divide evenly by 6.

The shuffle is biased because the number of execution paths doesn't divide evenly into the number of permutations. 27 paths across 6 permutations means 4.5 per permutation on average. Since you can't have half a path, some permutations get reached via 5 paths (appearing 18.5% of the time) and others via only 4 paths (14.8%). Neither is the expected 16.67%.

The bias is real, consistent, and measurable. Run 100,000 trials and you'll see it clearly. The identity permutation [A, B, C] appears roughly 14% of the time while [B, A, C] appears roughly 19%. Not a rounding error. A systematic flaw baked into the algorithm's structure.

Factorio developer discovering code 'works at the first try' and finding it deeply suspicious

The naive shuffle: passes every eyeball test, fails every statistical one.

The Fix: Shrink the Range

When picking a random element for position i, only consider elements you haven't placed yet.

Ronald Fisher and Frank Yates described this in 1938 as a paper-based statistical sampling method. Richard Durstenfeld adapted it to run in O(n) in 1964 using in-place swaps instead of copying to a separate array. Donald Knuth included it in "The Art of Computer Programming" in 1969, which is why you'll also see it called the Knuth shuffle.

The algorithm works backward through the array:

  1. Start at the last position (index n-1).
  2. Pick a random index j from 0 to i, inclusive.
  3. Swap elements at i and j.
  4. Decrement i and repeat until i reaches 1.

Trace through [A, B, C, D]:

Stepij (random from 0 to i)Array after swap
131[A, D, C, B]
222[A, D, C, B]
310[D, A, C, B]

Final result: [D, A, C, B].

At each step the unshuffled portion shrinks by one. Position 3 gets its permanent element in step 1. Position 2 in step 2. Position 1 in step 3. Position 0 is whatever's left. No element gets touched twice. Every element has a fair shot at every slot.

The Fisher-Yates Shuffle Code

import random def fisher_yates_shuffle(arr): for i in range(len(arr) - 1, 0, -1): j = random.randint(0, i) # shrinking range: 0 to i, inclusive arr[i], arr[j] = arr[j], arr[i] return arr
function fisherYatesShuffle<T>(arr: T[]): T[] { for (let i = arr.length - 1; i >= 1; i--) { const j = Math.floor(Math.random() * (i + 1)); [arr[i], arr[j]] = [arr[j], arr[i]]; } return arr; }

The only difference from the naive version: j is drawn from [0, i] instead of [0, n-1]. The upper bound shrinks with each step. That's the entire fix. One bound. One character.

Some implementations run forward instead of backward, picking j from [i, n-1] at each step. Both directions are correct. The backward version is more common in practice, and the one Knuth describes. Either works; just don't mix them up with the naive version that always picks from the full range.

For implementations across 14 languages including Java, Go, Rust, and C++, see the complete Fisher-Yates reference.

Why the Fix Actually Works

At step i (counting down from n-1 to 1), you choose from i+1 possible values. Multiply across all steps:

n × (n-1) × (n-2) × ... × 2 × 1 = n!

Exactly as many execution paths as there are permutations of n elements. Since each individual choice is uniformly random, every execution path has probability 1/n!. Since each path produces a distinct permutation, every permutation occurs with equal probability.

Fisher-Yates produces n! paths for n! permutations, one-to-one. The naive shuffle produces n^n paths for n! permutations. That's never one-to-one when n > 2. That's the whole story.

This is the argument you need to deliver in an interview. Not just the code, the counting argument.

O(n) Time, O(1) Space

Fisher-Yates visits each element exactly once with a constant-time swap. You can't shuffle faster than O(n) because you must touch every element at least once to put it in a random position.

It shuffles in-place with no auxiliary arrays. The memory footprint is the same whether you're shuffling ten elements or ten million. No data structure you use for storing elements is affected by n growing.

What the Interview Actually Tests

LeetCode 384 "Shuffle an Array" is the direct problem. You implement shuffle() (return a random permutation) and reset() (return to original state) as class methods. Fisher-Yates handles shuffle() directly. reset() means saving a copy of the original array on construction and returning it.

class Solution: def __init__(self, nums): self.original = nums[:] self.arr = nums def reset(self): self.arr = self.original[:] return self.arr def shuffle(self): for i in range(len(self.arr) - 1, 0, -1): j = random.randint(0, i) self.arr[i], self.arr[j] = self.arr[j], self.arr[i] return self.arr

The code is the easy part. Interviewers follow up:

  • "Why is picking from the full range wrong?" They want the n^n vs n! argument.
  • "How do you know this is uniform?" They want the n! paths = n! permutations proof.
  • "What's the space complexity?" O(1) extra, regardless of n.

The most common mistake is writing random.randint(0, n - 1) instead of random.randint(0, i). That one substitution reverts to the naive shuffle. Candidates who write the wrong version and can't articulate the bias often get passed on even after fixing it when prompted, because the follow-up revealed they'd memorized a pattern without understanding it.

Fisher-Yates also shows up embedded in other problems. If you need k random elements from an array of n, run only the first k iterations and collect positions n-1 down to n-k. That's uniform sampling without replacement, and it connects directly to reservoir sampling, which extends the same idea to streams of unknown length.

A candidate who starts with the naive version, identifies the bias unprompted, and explains the fix in real time signals strong problem-solving judgment. That trajectory tells the interviewer more than a candidate who wrote the right code immediately but couldn't say why.

If you want to practice explaining algorithms like this under real pressure, SpaceComplexity offers voice-based mock interviews with rubric-based feedback. The feedback flags whether your explanation demonstrated understanding of the underlying invariant, not just whether the code was correct.

Watch Out for Sattolo's Algorithm

Change random.randint(0, i) to random.randint(0, i - 1) and you get Sattolo's algorithm. One character. It looks nearly identical.

The result is a derangement: no element stays in its original position. Sattolo's has legitimate uses (generating cyclic permutations), but accidentally writing it by getting the loop bound wrong is one of the subtler bugs to catch in code review. The output still looks shuffled. The bias is hidden.

The boundary: randint(0, i) is Fisher-Yates. randint(0, i-1) is Sattolo.

When Fisher-Yates Breaks Down

Fisher-Yates requires O(1) random access. Shuffling a data structure where random access is O(n), like a linked list, destroys the O(n) time guarantee. For linked lists, copy to an array, shuffle, copy back.

If you need reproducible shuffles for testing, pass an explicit seed to the RNG before calling. Same code, same input, same result every time.

If you need cryptographically secure shuffling, replace Math.random() or Python's random module with crypto.getRandomValues() or Python's secrets module. Fisher-Yates is correct regardless of the RNG. The uniformity guarantee only holds if the underlying generator is itself uniform. Math.random() is not cryptographically secure.

Further Reading