Array vs Hash Map: They're Both O(1). They're Nothing Alike.

May 27, 202611 min read
dsaalgorithmsinterview-prepdata-structures
Array vs Hash Map: They're Both O(1). They're Nothing Alike.
TL;DR
  • Arrays give O(1) access by position with one address calculation and zero hashing overhead
  • Hash maps give O(1) access by arbitrary key but carry 2-4x memory overhead per entry
  • Cache locality makes array iteration 5-10x faster than hash map iteration despite identical Big-O
  • Direct-address tables (arrays as hash maps) beat hash maps when keys are bounded integers like letter frequencies
  • Hash maps are irreplaceable for arbitrary, sparse, or non-integer keys like Two Sum complements and anagram groups
  • One question decides it: can you compute a small integer index from the key? Array if yes, hash map if no

Both arrays and hash maps promise O(1) operations. Both show up in every DSA course by week two. And yet picking the wrong one turns a clean solution into a bloated mess that makes your interviewer quietly close the laptop lid. The distinction is what you already know when you reach for the data.

An array gives you O(1) access by position. A hash map gives you O(1) access by key. That single fact drives every tradeoff that follows: memory, cache behavior, and insertion cost.

What Does the Array Actually Do?

An array is a contiguous block of memory. Every element sits next to the one before it. When you write arr[i], the CPU computes a single address:

address = base + i * element_size

One multiply, one add, one memory fetch. No hashing, no bucket lookup, no collision handling. The index is the address. Fast food drive-through of data access. (For more on how dynamic arrays maintain this property while growing, see how append() actually works.)

Contiguous array memory layout showing base address, cells arr[0] through arr[7] with byte offsets, and a 64-byte cache line bracket Eight 4-byte integers packed into a single 64-byte cache line. One memory fetch loads all of them.

This layout has a second advantage invisible to Big-O. Modern CPUs load memory in 64-byte cache lines. When you read arr[0], the hardware loads arr[0] through arr[15] (for 32-bit integers) into L1 cache in one shot. The next 15 reads are free. You ordered one cheeseburger and they threw in fifteen more. The prefetcher detects the sequential pattern and pre-loads the next line before you ask.

An L1 cache hit costs about 1-2 nanoseconds. L2 costs 5-10 ns. Main memory costs 80-100 ns. Arrays that fit in cache are grotesquely fast. Arrays that don't still benefit from prefetching, because the access pattern is predictable.

What Does the Hash Map Actually Do?

A hash map takes a key, runs it through a hash function, and uses the result to pick a slot in an internal array:

slot = hash(key) % len(buckets)

If two keys land in the same slot (a collision), the map resolves it. Chaining hangs a linked list off each slot. Open addressing probes forward until it finds an empty one. Either way, average-case lookup is O(1), assuming a good hash function distributes keys uniformly. That assumption is doing heavy lifting.

Hash map lookup pipeline showing key alice hashing to bucket 3, with bob colliding at the same bucket and forming a chain The full journey of a hash map lookup. "alice" and "bob" both hash to bucket 3, forming a collision chain.

"O(1)" here means something different than for arrays. (Full analysis of hash map time complexity.) The hash map performs:

  1. Compute the hash of the key
  2. Compute the modulo to find the bucket
  3. Walk the chain (or probe sequence) comparing keys
  4. Return the value

That's at minimum three operations where the array needed one. Under good conditions it's still constant time. Under bad conditions (many collisions), it degrades to O(n). The 2011 28C3 hash-flooding attack proved it: 65,536 carefully chosen keys caused every key to collide, locking a PHP server's CPU core for 30 seconds. Nothing says "constant time" like waiting half a minute.

The Complexity Table That Actually Matters

OperationArrayHash Map
Access by indexO(1)N/A
Search by value / keyO(n) scanO(1) by key
Insert at endO(1) amortizedO(1) amortized
Insert at positionO(n) shiftN/A
Delete by indexO(n) shiftN/A
Delete by keyO(n) scan + shiftO(1) by key
Check existenceO(n) scanO(1)
Ordered iterationO(n), already sorted by indexO(n), no guaranteed order

Two things jump out. The array is unbeatable when you already know the index. Ask a hash map for "the third element" and watch it stare at you blankly. The hash map is unbeatable when you need "is this key present?" The array can only answer that by checking every element, like a bouncer without a guest list.

The array indexes by position. The hash map indexes by identity. They don't compete. They solve different problems.

The Memory Tax Nobody Talks About

Arrays store your data and nothing else. 1,000 integers = roughly 4,000 bytes. The Marie Kondo of data structures.

Hash maps store your data, the keys, the hashes, and a pile of bookkeeping pointers. Daniel Lemire benchmarked Java's HashMap storing 100,000 Integer-to-Integer pairs. The HashMap used 74.5 bytes per entry. A two-array approach used 40 bytes per entry. Nearly 2x overhead just for the privilege of saying map.get().

Side-by-side memory comparison showing array at approximately 4 bytes per entry versus HashMap at approximately 74.5 bytes per entry At one million entries, the array costs about 4 MB. The HashMap costs about 71 MB. Same data, 18x more memory.

String keys make it worse. Lemire measured 152.9 bytes per String-to-Integer entry. Each HashMap.Entry node costs 32 bytes on its own, before you count the key or value objects.

In Python, the situation improved with 3.6 (Hettinger's compact dict redesign cut memory 20-25%), but dictionaries still store three words per entry: key reference, value reference, and full hash.

A million entries? Those extra bytes add up to tens of megabytes. Your hash map is eating RAM like it's training a language model.

Why the Same O(1) Feels Completely Different

Iterate over a million integers and sum them in an array. Do the same with a hash map. Both are O(n). Both touch every element once. The array finishes 5-10x faster.

The reason is cache locality. The array's elements sit in contiguous memory. The CPU prefetches the next cache line while you're still processing the current one. Almost every access is an L1 hit at 1-2 ns.

The hash map's entries are scattered across the heap like socks after laundry day. Following a pointer from the bucket array to an Entry node to the key to the value means bouncing around memory randomly. The prefetcher gives up. Every hop risks a cache miss at 80-100 ns, a 50-100x penalty.

Cache locality comparison showing array scan with fast sequential L1 cache hits versus hash map iteration with slow scattered DRAM misses Top: array scan hits L1 cache at 2ns per element. Bottom: hash map chases pointers at 80-100ns per element. Same Big-O, wildly different wall-clock time.

Go benchmarks measure array access at ~60 ns/op and hash map access at ~124 ns/op for the same data. Same Big-O. Double the wall-clock. Big-O didn't lie exactly. It just left out the part that matters.

When an Array Does the Hash Map's Job Better

One pattern saves interviews and saves memory: when your keys come from a small, bounded range, use an array as a direct-address table.

Counting lowercase letter frequencies? Don't use a HashMap<Character, Integer>. Use int[26]. Your interviewer will silently award you bonus points. Or at least stop silently deducting them.

def is_anagram(s: str, t: str) -> bool: if len(s) != len(t): return False count = [0] * 26 for c in s: count[ord(c) - ord('a')] += 1 for c in t: count[ord(c) - ord('a')] -= 1 return all(x == 0 for x in count)

Faster, less memory, no hashing. The character 'a' maps to index 0, 'b' to 1, and so on. No collisions possible. CLRS calls this a direct-address table (Chapter 11.1), the structure hash tables were invented to replace only when the key universe grows too large.

The same trick applies everywhere keys are bounded integers:

  • Counting sort: count[value] instead of a frequency map
  • BFS/DFS visited tracking: visited[node_id] when nodes are numbered 0 to n-1
  • ASCII frequency: int[128] instead of HashMap<Character, Integer>
  • Digit counting: int[10] for digits 0-9

If you can compute the index directly from the key, you don't need a hash map. The array is the hash map, with a perfect hash function (the identity function) and zero overhead. It's like building a Rube Goldberg machine to flip a light switch when the switch is right there.

When Only a Hash Map Will Do

The array trick breaks when keys are unbounded, sparse, or non-integer. Which is most of real life.

Take Two Sum. Find two indices whose values sum to a target.

def two_sum(nums: list[int], target: int) -> list[int]: seen = {} for i, num in enumerate(nums): complement = target - num if complement in seen: return [seen[complement], i] seen[num] = i

The keys here are arbitrary integers, possibly negative, huge, or sparse. Allocating int[2_000_000_000] to store six values is a choice. Not a good one. The hash map's ability to map arbitrary keys to values in O(1) is the whole point.

Other cases where hash maps are irreplaceable:

  • Grouping by arbitrary key: anagram grouping, where the key is a sorted string
  • Deduplication: checking if you've seen an element before in an unbounded stream
  • Frequency counting over sparse data: word counts in a document
  • Memoization caches: mapping function inputs to outputs

When Should You Use an Array vs a Hash Map?

Before you default to HashMap, run through this:

  1. Are your keys integers in a small, known range? Use an array. count = [0] * (max_val + 1) beats any hash map for bounded keys.

  2. Do you need lookup by position (index)? Use an array. Hash maps don't even understand the question.

  3. Do you need O(1) lookup by arbitrary key? Use a hash map. Scanning an array is O(n), and your interviewer is watching.

  4. Do you need ordered iteration? Use an array (already ordered by index) or a TreeMap. Hash maps give no order guarantees.

  5. Is memory tight? Favor arrays. Hash maps carry 2-4x overhead per entry depending on the language.

  6. Sequential processing? Arrays win on cache performance. Hash maps scatter memory like confetti.

Decision flowchart for choosing between array and hash map based on key type, lookup needs, and ordering requirements When in doubt, start at the top. One question usually settles it.

The Mistake That Costs You in Interviews

Using a hash map for bounded keys wastes memory for no benefit. When an interviewer sees HashMap<Character, Integer> for letter frequency counting, they know you didn't think about the key space. An int[26] is cleaner, faster, and communicates "I understand what's happening here."

Scanning an array to check existence is worse. A nested loop to check whether an element appears in another list is O(n^2) code that a hash set solves in O(n). This is the single most common O(n) to O(1) upgrade in coding interviews.

The fix is mechanical. Ask: what operation do I need to be fast? If it's "access by position," array. If it's "lookup by key" or "check existence," hash map. Practice that question on SpaceComplexity with a voice-based mock, and it becomes automatic.

Recap

  • Arrays give O(1) by position. One multiply, one add, one fetch. Hash maps can't do positional access at all.
  • Hash maps give O(1) by key. Hash, mod, probe. Arrays need a full O(n) scan to find a value.
  • Arrays use 2-4x less memory per element than hash maps, which store keys, hashes, and pointers alongside your data.
  • Arrays dominate on cache performance. Contiguous memory lets the prefetcher work. Hash maps scatter entries across the heap.
  • Use an array as a direct-address table when keys are bounded integers (letter counts, digit counts, visited arrays). It's a hash map with zero overhead.
  • Use a hash map when keys are arbitrary, sparse, or non-integer. Two Sum, anagram grouping, and deduplication all need this.
  • Ask one question: can I compute a small integer index from the key? If yes, array. If no, hash map.

Further reading