HashMap vs Array for Counting: How to Choose Every Time

June 7, 202610 min read
dsaalgorithmsinterview-prepdata-structures
HashMap vs Array for Counting: How to Choose Every Time
TL;DR
  • Bounded key space (26 letters, 128 ASCII, 10 digits) means use an array: direct index arithmetic, no hashing, and 5-10x better cache performance
  • Unbounded or non-integer keys (arbitrary integers, strings, tuples) require a HashMap since you cannot preallocate a fixed array
  • The decision question: can keys map to small integers in a known range? Yes means array, no means HashMap
  • Cache locality explains most of the speed gap: a 26-element frequency array fits in one CPU cache line; HashMap values scatter across heap
  • Sparse key space is the exception: if the range is large but few distinct keys appear, HashMap uses less memory than a fixed array
  • Interview signal: naming this tradeoff unprompted, and justifying which structure fits the problem, lifts an algorithms answer from score 3 to score 4

You have a string. You want to count how many times each character appears. Two minutes in, fingers moving before your brain catches up, you type freq = {}. Natural. Automatic. Probably wrong.

This is the HashMap reflex. It happens to everyone. The data structure is so versatile, so forgiving, so universally applicable that it becomes the default answer to every counting question. The problem is that for most counting problems, a plain array is faster, simpler, and uses less memory. In the hashmap vs array decision for counting, HashMap is the flexible default, not the optimal one. Knowing which to reach for first is the kind of small judgment that separates clean interview code from code that technically works but quietly signals you haven't thought it through.

Every Counting Problem Has the Same Shape

Given some collection of keys, build a map from key to count, then do something with it. Check if two strings are anagrams. Find the most frequent element. Count distinct characters in a sliding window. Verify a permutation exists inside a larger string.

The data structure you pick for the counting step determines everything downstream. Get it wrong and you pay in speed, memory, or code clarity. Sometimes all three at once.

The split comes down to one question: can the keys be mapped to small integers?

When an Array Destroys the HashMap

An array works when your keys map cleanly to small integers in a known range. The classic case is lowercase ASCII letters.

def is_anagram(s: str, t: str) -> bool: if len(s) != len(t): return False count = [0] * 26 for c in s: count[ord(c) - ord('a')] += 1 for c in t: count[ord(c) - ord('a')] -= 1 return all(x == 0 for x in count)

ord(c) - ord('a') maps 'a' to 0, 'z' to 25. Every increment is a direct array write with zero hash computation. No hashing, no bucket lookup, no collision resolution. You're doing arithmetic and writing to memory. That's it.

The pattern generalizes. Numbers in range [0, k]? Allocate k + 1. Uppercase letters? ord(c) - ord('A'). Full printable ASCII? 128 covers it.

// Java: character frequency for anagram check int[] count = new int[26]; for (char c : s.toCharArray()) count[c - 'a']++; for (char c : t.toCharArray()) count[c - 'a']--; for (int x : count) if (x != 0) return false; return true;
// C++: lowercase character count int freq[26] = {}; for (char c : word) freq[c - 'a']++;

The space is exactly O(k), completely independent of input length. A million-character string and a ten-character string use the same 26-element array. You allocated it once, you're done.

When You Actually Need a HashMap

HashMap earns its place when the key range is unbounded or when keys are not integers at all: arbitrary strings, large numbers, tuples, the cursed things your coworker puts in a set.

from collections import Counter def top_k_frequent(nums: list[int], k: int) -> list[int]: freq = Counter(nums) return sorted(freq, key=freq.get, reverse=True)[:k]

If nums can contain any integer from -10^9 to 10^9, you cannot allocate a 2-billion-element array. HashMap is the only sane option. Space is O(n) where n is the number of distinct keys, growing with input rather than sitting fixed at allocation.

Python's Counter is idiomatic and fast. Java uses HashMap<Integer, Integer> with getOrDefault. JavaScript uses Map or a plain object. C++ uses std::unordered_map. Each carries overhead that a plain array doesn't: hash computation, potential rehashing when the load factor crosses a threshold, and pointer indirection to reach stored values.

For word frequency counting, there is no alternative. The key space is effectively infinite.

The Real Difference Between Them

ArrayHashMap
ReadO(1)O(1) average, O(n) worst
WriteO(1)O(1) amortized
SpaceO(k), fixedO(n), grows with distinct keys
Key typeInteger (bounded)Any hashable
Cache behaviorExcellentPoor
InitializationO(k)O(1)

The "worst case O(n)" for HashMap is not theoretical paranoia. Adversarial inputs with deliberate hash collisions can degrade lookup to O(n). Python randomizes its hash seed per process startup specifically to prevent this. Arrays have no such failure mode: index arithmetic is deterministic. Nobody is crafting inputs to make count['a'] slow.

Initialization is also worth watching. [0] * 26 is cheap. [0] * 100000 less so. If your range is large but sparse (many possible keys, but only a few actually occur), HashMap wins on space even if it loses on speed.

Cache Locality Is the Hidden Difference

Both operations are O(1). But the constant matters a lot when you're running 10^4 iterations in a tight inner loop. This is where the array quietly laps the HashMap while the HashMap is still looking up its keys.

Array vs HashMap memory layout comparison showing contiguous vs scattered allocations

An array is a single block of contiguous memory. When your CPU loads count[3], it fetches a 64-byte cache line that also contains count[4] through roughly count[19]. The next sixteen accesses to nearby indices are essentially free. The hardware prefetcher detects the sequential stride and loads ahead automatically.

A HashMap stores each bucket as a separate heap allocation, often scattered across memory. Reaching a value means computing the hash, finding the bucket, and following a pointer. That pointer dereference is a cache miss if the bucket isn't in cache. With chaining, you follow a linked list. With open addressing, you probe around the table. Either way, the CPU is waiting, and the hardware prefetcher has completely given up on you.

Benchmarks consistently show arrays running 5-10x faster than HashMaps for small bounded counting problems. A 26-element letter frequency array fits in a single cache line. You miss exactly once, ever, and the whole structure lives in L1. This is the same principle covered in cache-friendly code.

One Question Settles HashMap vs Array

Ask: is the key space bounded and small?

If yes, use an array. Set the size to the range, use arithmetic to compute the index.

If no, use a HashMap. You have no choice when keys are arbitrary strings, large integers, or non-integer types.

"Small" is loosely in the hundreds. 26 for lowercase letters, 128 for ASCII, 10 for digits. An array of size 10^6 costs 4 MB for int32, which may or may not be acceptable depending on constraints. An array of size 10^9 is never acceptable. (Your interviewer will have questions. So will your laptop fan.)

There's a second check: is the key space sparse? If your integers range from 0 to 10^6 but only 20 distinct values appear, a HashMap uses far less memory even if an array would be faster per access.

Three Interview Problems, Two Data Structures

Valid Anagram (LeetCode 242). Two strings over lowercase letters. Range is exactly 26. Use an array. Increment for one string, decrement for the other, check all zeros. No HashMap needed.

Top K Frequent Elements (LeetCode 347). Integers from -10^9 to 10^9. Range is too large. Use a HashMap for frequencies, then a min-heap of size k or bucket sort. HashMap is the only correct option for the counting step.

Minimum Window Substring (LeetCode 76). Pattern string over ASCII characters. Range is 128. You can use an array of 128 and avoid HashMap overhead entirely in the sliding window loop. Many accepted solutions use a HashMap out of habit, and they pass because the test cases aren't adversarial and LeetCode doesn't grade your aesthetics. But the array version has better cache behavior, simpler code, and one fewer excuse to reach for the wrong tool by default. Passing is not the same as optimal.

Code You'll Actually Write

# Array: lowercase letter frequencies freq = [0] * 26 for c in word: freq[ord(c) - ord('a')] += 1 # Array: digit frequencies freq = [0] * 10 for c in s: freq[int(c)] += 1 # Array: ASCII character frequencies freq = [0] * 128 for c in s: freq[ord(c)] += 1 # HashMap: arbitrary integers or strings (no array shortcut exists) from collections import Counter freq = Counter(collection)

In Java and C++, the performance gap is even larger because HashMap implementations carry more per-entry overhead:

// Java: array (fastest for bounded keys) int[] freq = new int[26]; for (char c : s.toCharArray()) freq[c - 'a']++; // Java: HashMap (necessary for unbounded keys) Map<Integer, Integer> freq = new HashMap<>(); for (int n : nums) freq.merge(n, 1, Integer::sum);
// C++: array (bounded) int freq[26] = {}; for (char c : s) freq[c - 'a']++; // C++: unordered_map (unbounded) unordered_map<int, int> freq; for (int n : nums) freq[n]++;

Why This Decision Shows Up in Interviews

The choice of counting structure rarely breaks correctness. It shows up in complexity analysis and design discussion. And in whether your interviewer scribbles a question mark next to "technical depth" or a checkmark.

If you say "I'm using a HashMap so lookup is O(1)" without noting that an array would also be O(1) but with better constants, you've missed a depth signal. A strong interviewer may ask: "Could you use an array here?" If your answer is "I didn't think about it," that's a gap. If your answer is "Yes, since we're only counting lowercase letters the key range is bounded at 26, so an array of size 26 gives better cache performance," you've shown you understand what's actually happening in memory.

Naming tradeoffs before being asked is what separates scores of 3 and 4 on the algorithms dimension. The ability to articulate it live, under pressure, while also writing correct code, is something you train. SpaceComplexity runs voice-based mock interviews with rubric scoring on exactly these dimensions, so you hear your own explanations before they matter.

The array vs hash map tradeoff guide goes deeper on when O(1) really means O(1) for both structures, including the amortized analysis for resizing. For complexity claims you'll need to defend, the hash map time complexity breakdown covers average case, worst case, and what actually causes degradation.

Key Takeaways

  • Use an array when keys are small, bounded integers. Lowercase letters = 26, ASCII = 128, digits = 10.
  • Use a HashMap when keys are unbounded, large integers, strings, or non-integer types.
  • Arrays have 5-10x better cache locality for small ranges. Both are O(1) access. The gap is in the constant, the space, and worst-case failure modes.
  • Naming the tradeoff in an interview, even when you choose HashMap, shows more depth than choosing silently.
  • Python's Counter, Java's HashMap.merge, and C++'s unordered_map are the idiomatic patterns. Know the array pattern in each language too.

Further Reading