Set vs Map: When Each Data Structure Is the Right Call

Both live behind a hash table. Both give you O(1) lookups. The distinction takes about ten seconds to internalize, and then you'll spend the next six months quietly judging code that uses a dict when it only ever checks keys. Pick the wrong one and your code still works. That's the problem. The bug is invisible, but the interviewer noticed.

The Core Difference Is What You're Storing

A Set stores elements. You put a value in, and later you ask: "is this value here?"

A Map stores pairs. Every key maps to a value. You put a key-value pair in, and later you ask: "what value does this key map to?"

A Set is a Map where the value doesn't matter. Most language implementations make this literal. Java's HashSet is backed by a HashMap<E, Object> with a static PRESENT placeholder as the value. A dummy object, allocated once, pointed to by every single entry forever. Technically correct. Slightly unsettling. Go has no built-in Set, so the idiomatic pattern is map[K]struct{}, using an empty struct because it consumes zero bytes. It's the most Go solution imaginable: correct, terse, and just uncomfortable enough to be memorable. Python's set does it cleanly, storing only keys with no dummy value at all.

Whenever you catch yourself building a Map but only ever using the keys, you wanted a Set.

What the Complexity Table Actually Tells You

Operation	HashSet / HashMap	TreeSet / TreeMap
Insert	O(1) avg	O(log n)
Delete	O(1) avg	O(log n)
Lookup	O(1) avg	O(log n)
Min / Max	O(n)	O(log n)
Ordered iteration	No	Yes
Space	O(n)	O(n)

The hash-based versions are faster for point queries; the tree-based versions trade that speed for sorted order. You need sorted order when the problem asks for "the smallest unused integer" or "the k-th element in sorted order."

Worst-case for hash tables is O(n) due to collisions, but this is rare with proper load factor management. For interview purposes, O(1) average is the number to cite.

The One Signal That Decides Set vs Map

Read the problem. Then ask one question: do I need to store anything alongside each element?

If no, you're tracking membership. Use a Set.

If yes, you're tracking a relationship. Use a Map.

Five patterns that reliably map to each:

Use a Set when:

Checking whether something has been seen before (membership test)
Removing duplicates from a collection
Computing intersection, union, or difference between two groups
The problem says "find duplicate" or "return unique elements"

Use a Map when:

Counting frequencies (element to count)
Grouping items by a shared property (key to list of items)
Associating an index or position with each value
The problem says "first occurrence", "most frequent", or "return both elements"

Read These Five Problems Like an Interviewer Would

Contains Duplicate (LeetCode 217)

"Given an integer array, return true if any value appears at least twice."

You need: has this number been seen before? No associated data. That's a Set.

def containsDuplicate(nums: list[int]) -> bool:
    seen = set()
    for n in nums:
        if n in seen:
            return True
        seen.add(n)
    return False

Using a dict here works, but wastes a value slot per entry and tells your interviewer you picked the tool you were most comfortable with rather than the right one.

Two Sum (LeetCode 1)

"Return indices of two numbers that add to target."

You need to check whether the complement of each number has appeared, and you need to return its index. The moment you need to return associated data, you need a Map. A Set would answer "have I seen the complement?" It cannot give you the index. You'd need a second data structure to retrieve it, which is just building a Map the hard way.

def twoSum(nums: list[int], target: int) -> list[int]:
    seen = {}  # value -> index
    for i, n in enumerate(nums):
        complement = target - n
        if complement in seen:
            return [seen[complement], i]
        seen[n] = i
    return []

Intersection of Two Arrays (LeetCode 349)

"Return the intersection of two arrays, each element appearing once."

No associated data. Two Sets, then a built-in intersection.

def intersection(nums1: list[int], nums2: list[int]) -> list[int]:
    return list(set(nums1) & set(nums2))

The & operator computes intersection in O(min(len(s1), len(s2))) time. One line, only possible because Set gives you set algebra for free.

First Unique Character (LeetCode 387)

"Find the first non-repeating character in a string."

Count implies Map (character to count), then a second pass to find the first entry with count 1.

from collections import Counter

def firstUniqChar(s: str) -> int:
    freq = Counter(s)
    for i, ch in enumerate(s):
        if freq[ch] == 1:
            return i
    return -1

Group Anagrams (LeetCode 49)

"Group strings that are anagrams of each other."

You need a canonical form for each group plus a list of every string that shares it. Key to list of values. Map.

from collections import defaultdict

def groupAnagrams(strs: list[str]) -> list[list[str]]:
    groups = defaultdict(list)
    for s in strs:
        key = tuple(sorted(s))
        groups[key].append(s)
    return list(groups.values())

If you reached for a Set here, I respect the ambition. There's just nowhere to put the strings.

When You Actually Need Sorted Order

Sometimes O(1) isn't enough. When the problem requires sorted order, min/max in O(log n), or range queries, you reach for the tree-backed variants.

Use an ordered Set (TreeSet in Java, std::set in C++) when:

You need the smallest or largest element after insertions and deletions
The problem involves "find closest value" or "next greater element in a dynamic set"
You're running a sweep-line algorithm where elements enter and leave in sorted order

Use an ordered Map (TreeMap in Java, std::map in C++) when:

You need to iterate keys in sorted order
You're doing range queries like "all keys between 3 and 7"
The problem involves interval scheduling, calendar conflicts, or skyline queries

The BST vs HashMap tradeoff shows up exactly here: pay O(log n) when you need ordered access, and use O(1) when you don't.

C++ Names These Backwards

Every major language follows the same convention: plain name means hash-backed, tree/sorted means the variation. C++ looked at this convention and went the other direction. Not as a mistake. On purpose.

Structure	C++ Name	Complexity
Hash set	`unordered_set`	O(1) avg
Hash map	`unordered_map`	O(1) avg
Ordered set	`set`	O(log n)
Ordered map	`map`	O(log n)

If you write C++ and reach for set thinking you're getting a hash table, you just paid O(log n) for everything. Your code will run correctly. Just slower. And unordered_set was sitting right there the whole time. Know this before your interview and say it out loud when you reach for the right one.

Language Reference

Language	Hash Set	Hash Map	Ordered Set	Ordered Map
Python	`set`	`dict`	(use `bisect`)	(use `bisect`)
Java	`HashSet`	`HashMap`	`TreeSet`	`TreeMap`
C++	`unordered_set`	`unordered_map`	`set`	`map`
Go	`map[K]struct{}`	`map[K]V`	(no built-in)	(no built-in)
JavaScript	`Set`	`Map`	(no built-in)	(no built-in)
TypeScript	`Set<K>`	`Map<K,V>`	(no built-in)	(no built-in)

Python has no built-in sorted set or sorted map. The bisect module on a sorted list covers most "ordered set" operations in interviews. The sortedcontainers library exists but isn't available on LeetCode without importing manually. Java and C++ are the cleanest choices when the problem needs ordered structures.

The Decision in Ten Seconds

Do I need to store anything alongside each element?
├── No  → Set  (membership, deduplication, set algebra)
└── Yes → Map  (counting, grouping, associating data)

Do I need sorted order or range queries?
├── No  → Hash-backed  → O(1) average
└── Yes → Tree-backed  → O(log n), ordered iteration

Build this before you touch the keyboard and you'll never backtrack because your set can't return an index, or your dict is doing work you didn't ask for.

If you want to practice explaining these choices out loud before you code, SpaceComplexity runs voice-based mock interviews where the rubric scores data structure reasoning as an explicit dimension. "I'm using a Map here because I need the index alongside each value" is worth points before you write a single line.

For a deeper look at how hash map time complexity works under the hood, and when the O(1) guarantee quietly breaks, that's worth reading before your next interview. And if you find yourself choosing between HashSet and TreeSet specifically, the HashSet vs TreeSet breakdown covers the ordered-vs-unordered tradeoff in more detail.