What Is a Bitmask? How One Integer Represents Every Combination

Somewhere in your interview prep, you hit a problem where the editorial says "use bitmask DP." You stare at code with 1 << i and state | (1 << j). You scroll past it. You close the tab. That was the mistake.

Bitmasks are not advanced. They are a clever use of something you already understand: integers in binary. Once the mechanic clicks, a whole class of bit manipulation problems becomes trivially recognizable. The editorial stops looking scary and starts looking like a gift.

An Integer Is Already a Data Structure

Your language charges you for everything. Arrays need allocation. Hash sets need hashing. Lists link pointers. A 32-bit integer just exists, and it comes with 32 slots already baked in.

A bitmask is an integer you use to represent a set of flags, options, or elements, where each bit position stands for one element being in or out.

Say you have five friends and want to represent every possible group. Instead of a list, use a 5-bit integer. If bit 3 is 1, friend 3 is in the group. If it is 0, they are not. The integer 0b10110 (binary 22) means friends 1, 2, and 4 are included (zero-indexed from the right).

Bitmask visualization: a 5-bit integer where lit bits represent group members

That gives you 2^5 = 32 possible groups, each represented by a unique integer from 0 to 31. No lists, no sets, just arithmetic on integers your CPU handles natively. Free real estate.

Six Operators, That's It

Everything you do with bitmasks uses six operators. There are 300+ React hooks. There are six of these. Memorize all six.

Operator	Symbol	What It Does
AND	`&`	Both bits must be 1
OR	`\|`	At least one bit is 1
XOR	`^`	Exactly one bit is 1
NOT	`~`	Flip all bits
Left shift	`<<`	Move bits left, fill zeros on right
Right shift	`>>`	Move bits right

A quick mental model: AND is intersection, OR is union, XOR is symmetric difference.

a = 0b1010  # 10
b = 0b1100  # 12

print(a & b)   # 0b1000 = 8   (only bits both have set)
print(a | b)   # 0b1110 = 14  (bits either has set)
print(a ^ b)   # 0b0110 = 6   (bits exactly one has set)
print(a << 1)  # 0b10100 = 20 (shift left by 1 = multiply by 2)
print(a >> 1)  # 0b0101 = 5   (shift right by 1 = floor divide by 2)

One gotcha deserves a loud callout. ~a in Python gives -(a+1), not what you expected. Python integers have arbitrary precision, so "flip all the bits" is undefined unless you pick a width. In Java, C++, and Go, ~n flips all 32 or 64 bits and the result is predictable two's complement. This matters for the clear-bit operation below, and it will bite you silently on a LeetCode Python submission if you forget it.

The Five Operations You Will Actually Use

These patterns appear in nearly every bitmask problem. Know all five cold. Not "vaguely remember," cold.

Check if bit k is set:

(n >> k) & 1  # returns 1 if bit k is set, 0 otherwise

Shift the bit you care about into position zero, then AND with 1 to isolate it.

Set bit k (turn it on):

n | (1 << k)

Create a mask with only bit k as 1, then OR it into n. The rest of n is untouched.

Clear bit k (turn it off):

n & ~(1 << k)

Create a mask with bit k as 0 and everything else as 1, then AND it in.

Toggle bit k:

n ^ (1 << k)

XOR flips exactly bit k. If it was 1 it becomes 0, and vice versa.

Count the number of set bits:

bin(n).count('1')   # works anywhere
n.bit_count()       # Python 3.10+

In Java: Integer.bitCount(n). In C/C++ with GCC: __builtin_popcount(n). In Go: bits.OnesCount(uint(n)).

One more trick worth keeping. To check if n is a power of two:

n > 0 and (n & (n - 1)) == 0

Subtracting 1 from a power of two flips all the lower bits. ANDing with the original gives zero if and only if there was exactly one set bit. This is also the core of Brian Kernighan's algorithm: n & (n - 1) strips the lowest set bit, so looping until n == 0 counts set bits in O(number of set bits) rather than O(bit width).

Why Bitmasks Are Fast

Every bitmask operation runs in O(1) time and uses O(1) space. A single CPU instruction handles AND, OR, or XOR on 64-bit integers. No allocation, no pointer chasing, no iteration.

Compare that to a Python set. Membership testing is amortized O(1), but there is hashing overhead, object allocation, and memory proportional to the number of elements. A bitmask representing 30 elements fits in one 64-bit register. A Python set with 30 integers takes roughly 2-3 KB depending on values. The register does not care. The register does not even know you are having this conversation.

For individual flag operations, this difference is irrelevant. For bitmask DP, where you compute millions of states, it matters a lot.

The space implication of bitmask DP: representing subsets of an n-element set gives you 2^n possible states. At n = 20, that is roughly one million states. Manageable. At n = 32, that is over four billion. That is a "submit and walk away for an hour" situation, not a solution.

The interview signal for bitmask DP is a small n (almost always 20 or fewer) combined with "track which items you have visited or chosen." When you see that constraint, stop looking for a polynomial DP and start thinking exponential state space.

Three Patterns That Keep Appearing

XOR Cancellation

XOR has two properties that solve a surprising number of problems:

x ^ x = 0 (any value XOR'd with itself is zero)
x ^ 0 = x (any value XOR'd with zero is itself)

XOR all elements of an array where every value appears twice except one. Every duplicate cancels out. What remains is the single unpaired element. This is LeetCode 136: Single Number, O(n) time, O(1) space. It feels like a magic trick. It is not magic. It is just XOR.

def single_number(nums: list[int]) -> int:
    result = 0
    for n in nums:
        result ^= n
    return result

The same cancellation property handles "find the missing number in 0..n." XOR all array values against all indices from 0 to n. Present values cancel their index. What survives is the missing index.

Subset Enumeration

To generate all subsets of an n-element array, iterate integers from 0 to 2^n - 1. For each integer mask, bit k being set means element k belongs to this subset.

def all_subsets(arr: list) -> list[list]:
    n = len(arr)
    result = []
    for mask in range(1 << n):
        subset = [arr[k] for k in range(n) if (mask >> k) & 1]
        result.append(subset)
    return result

This runs in O(n * 2^n): you inspect n bits for each of the 2^n masks. Each element appears in exactly 2^(n-1) subsets, which you can verify by summing the popcount across all masks.

This pattern appears in problems that ask you to check all possible groupings or partitions when n is small. If you have seen the backtracking approach to subsets, bitmask enumeration is the non-recursive equivalent. Both are O(n * 2^n) in time; the bitmask version often has better constant factors because it avoids the overhead of recursive calls.

Bitmask DP

The classic application is the Traveling Salesman: find the shortest route visiting every city exactly once.

The state is dp[mask][i], where mask is a bitmask of visited cities and i is the current city. For each state you extend to unvisited cities by OR-ing in their bit.

def tsp(dist: list[list[int]]) -> int:
    n = len(dist)
    INF = float('inf')
    dp = [[INF] * n for _ in range(1 << n)]
    dp[1][0] = 0  # started at city 0

    for mask in range(1 << n):
        for u in range(n):
            if dp[mask][u] == INF:
                continue
            if not (mask >> u) & 1:
                continue
            for v in range(n):
                if (mask >> v) & 1:
                    continue  # already visited
                new_mask = mask | (1 << v)
                dp[new_mask][v] = min(dp[new_mask][v], dp[mask][u] + dist[u][v])

    full_mask = (1 << n) - 1
    return min(dp[full_mask][i] + dist[i][0] for i in range(n))

Dog sitting calmly with eyes closed, captioned "When the interview is going really well and the interviewer asks you a dynamic programming problem"

When the constraints say n ≤ 20 and you know exactly where this is going.

The Held-Karp algorithm runs in O(n^2 * 2^n) time and O(n * 2^n) space. For n = 20 that is roughly 400 million operations and 20 MB. For n = 25 it crosses one billion operations and becomes impractical. Know the cutoff.

Bitmask DP comes up on problems beyond TSP: minimum cost to assign k jobs among k workers, counting paths through a small directed graph without revisiting nodes, and similar "track visited state" problems where the set fits in an integer.

For the DP mechanics, the dynamic programming framework covers how to identify states and transitions before you write code.

Where Languages Diverge

The operators (&, |, ^, <<, >>) are consistent across every major interview language. The differences are minor but worth knowing before they surprise you mid-interview.

Python: Arbitrary-precision integers, so ~n gives -(n+1). Mask to 32 bits with & 0xFFFFFFFF when needed. No fixed integer overflow.

Java: Fixed 32-bit int and 64-bit long. ~n is the two's complement NOT. Arithmetic right shift for signed, >>> for unsigned.

C++: int is typically 32-bit, long long is 64-bit. ~n flips all bits. Use unsigned types when you want logical right shift.

Go: Uses ^ for both XOR (binary) and bitwise NOT (unary). ^n in Go means NOT, while a ^ b means XOR. This looks like a typo. It is not a typo. It will haunt you once and then you will remember it forever.

Rust: !n is bitwise NOT (not ~). Everything else is standard.

For most interview problems you do not need the edge cases. They matter when you implement 32-bit integer problems like LeetCode 190 (Reverse Bits) in Python and the sign bit behavior bites you.

When Not to Use a Bitmask

Not every set problem wants a bitmask. Specifically: not the ones where n is large.

If the set can be large (more than 60 or so elements), you need arbitrary-precision integers, which lose the O(1) constant-factor advantage. Use a hash set.

If your bitmask DP has n > 20, the 2^n state space is likely too large. Look for a polynomial DP formulation. See recognizing when DP applies for the signals.

In production code, a plain set is almost always clearer. The performance gap only matters at bitmask DP scale, not for individual flag checks. Write the readable thing until you have a profiler telling you otherwise.

Quick Reference

Operation	Code
Check bit k	`(n >> k) & 1`
Set bit k	`n \| (1 << k)`
Clear bit k	`n & ~(1 << k)`
Toggle bit k	`n ^ (1 << k)`
Count set bits	`bin(n).count('1')`
Power of two check	`n > 0 and (n & (n - 1)) == 0`
Strip lowest set bit	`n & (n - 1)`
Isolate lowest set bit	`n & (-n)`
All subsets of n elements	`for mask in range(1 << n)`
Check if k not yet visited	`not (mask >> k) & 1`
Mark k as visited	`mask \| (1 << k)`

The 14 bit manipulation problems list covers the full range from XOR tricks to bitmask DP, ordered by what each one teaches. Recognizing bit manipulation patterns is worth a read before your next session if you want to build the signal-detection instinct rather than memorize solutions.

When you are ready to test your explanations under pressure, SpaceComplexity runs voice-based mock interviews where the rubric scores how well you communicate your reasoning, not just whether you got the answer.