Set vs Map: When Each Data Structure Is the Right Call

- A Set stores elements; a Map stores key-value pairs — if you only need "has this been seen?", reach for the Set
- Java's
HashSetis literally aHashMapwith a dummy value — Go's idiomatic Set ismap[K]struct{} - The deciding question: do I need to store anything alongside each element? No = Set, Yes = Map
- Hash-backed (O(1) avg) vs tree-backed (O(log n), sorted) — use tree variants only when you need ordered iteration or range queries
- C++ names these backwards —
setandmapare tree-backed;unordered_setandunordered_mapare the O(1) hash versions - Five LeetCode patterns mapped: Contains Duplicate (Set), Two Sum (Map), Intersection (Set), First Unique Char (Map), Group Anagrams (Map)
Both live behind a hash table. Both give you O(1) lookups. The distinction takes about ten seconds to internalize, and then you'll spend the next six months quietly judging code that uses a dict when it only ever checks keys. Pick the wrong one and your code still works. That's the problem. The bug is invisible, but the interviewer noticed.
The Core Difference Is What You're Storing
A Set stores elements. You put a value in, and later you ask: "is this value here?"
A Map stores pairs. Every key maps to a value. You put a key-value pair in, and later you ask: "what value does this key map to?"
A Set is a Map where the value doesn't matter. Most language implementations make this literal. Java's HashSet is backed by a HashMap<E, Object> with a static PRESENT placeholder as the value. A dummy object, allocated once, pointed to by every single entry forever. Technically correct. Slightly unsettling. Go has no built-in Set, so the idiomatic pattern is map[K]struct{}, using an empty struct because it consumes zero bytes. It's the most Go solution imaginable: correct, terse, and just uncomfortable enough to be memorable. Python's set does it cleanly, storing only keys with no dummy value at all.
Whenever you catch yourself building a Map but only ever using the keys, you wanted a Set.
What the Complexity Table Actually Tells You
| Operation | HashSet / HashMap | TreeSet / TreeMap |
|---|---|---|
| Insert | O(1) avg | O(log n) |
| Delete | O(1) avg | O(log n) |
| Lookup | O(1) avg | O(log n) |
| Min / Max | O(n) | O(log n) |
| Ordered iteration | No | Yes |
| Space | O(n) | O(n) |
The hash-based versions are faster for point queries; the tree-based versions trade that speed for sorted order. You need sorted order when the problem asks for "the smallest unused integer" or "the k-th element in sorted order."
Worst-case for hash tables is O(n) due to collisions, but this is rare with proper load factor management. For interview purposes, O(1) average is the number to cite.
The One Signal That Decides Set vs Map
Read the problem. Then ask one question: do I need to store anything alongside each element?
If no, you're tracking membership. Use a Set.
If yes, you're tracking a relationship. Use a Map.
Five patterns that reliably map to each:
Use a Set when:
- Checking whether something has been seen before (membership test)
- Removing duplicates from a collection
- Computing intersection, union, or difference between two groups
- The problem says "find duplicate" or "return unique elements"
Use a Map when:
- Counting frequencies (element to count)
- Grouping items by a shared property (key to list of items)
- Associating an index or position with each value
- The problem says "first occurrence", "most frequent", or "return both elements"
Read These Five Problems Like an Interviewer Would
Contains Duplicate (LeetCode 217)
"Given an integer array, return true if any value appears at least twice."
You need: has this number been seen before? No associated data. That's a Set.
def containsDuplicate(nums: list[int]) -> bool: seen = set() for n in nums: if n in seen: return True seen.add(n) return False
Using a dict here works, but wastes a value slot per entry and tells your interviewer you picked the tool you were most comfortable with rather than the right one.
Two Sum (LeetCode 1)
"Return indices of two numbers that add to target."
You need to check whether the complement of each number has appeared, and you need to return its index. The moment you need to return associated data, you need a Map. A Set would answer "have I seen the complement?" It cannot give you the index. You'd need a second data structure to retrieve it, which is just building a Map the hard way.
def twoSum(nums: list[int], target: int) -> list[int]: seen = {} # value -> index for i, n in enumerate(nums): complement = target - n if complement in seen: return [seen[complement], i] seen[n] = i return []
Intersection of Two Arrays (LeetCode 349)
"Return the intersection of two arrays, each element appearing once."
No associated data. Two Sets, then a built-in intersection.
def intersection(nums1: list[int], nums2: list[int]) -> list[int]: return list(set(nums1) & set(nums2))
The & operator computes intersection in O(min(len(s1), len(s2))) time. One line, only possible because Set gives you set algebra for free.
First Unique Character (LeetCode 387)
"Find the first non-repeating character in a string."
Count implies Map (character to count), then a second pass to find the first entry with count 1.
from collections import Counter def firstUniqChar(s: str) -> int: freq = Counter(s) for i, ch in enumerate(s): if freq[ch] == 1: return i return -1
Group Anagrams (LeetCode 49)
"Group strings that are anagrams of each other."
You need a canonical form for each group plus a list of every string that shares it. Key to list of values. Map.
from collections import defaultdict def groupAnagrams(strs: list[str]) -> list[list[str]]: groups = defaultdict(list) for s in strs: key = tuple(sorted(s)) groups[key].append(s) return list(groups.values())
If you reached for a Set here, I respect the ambition. There's just nowhere to put the strings.
When You Actually Need Sorted Order
Sometimes O(1) isn't enough. When the problem requires sorted order, min/max in O(log n), or range queries, you reach for the tree-backed variants.
Use an ordered Set (TreeSet in Java, std::set in C++) when:
- You need the smallest or largest element after insertions and deletions
- The problem involves "find closest value" or "next greater element in a dynamic set"
- You're running a sweep-line algorithm where elements enter and leave in sorted order
Use an ordered Map (TreeMap in Java, std::map in C++) when:
- You need to iterate keys in sorted order
- You're doing range queries like "all keys between 3 and 7"
- The problem involves interval scheduling, calendar conflicts, or skyline queries
The BST vs HashMap tradeoff shows up exactly here: pay O(log n) when you need ordered access, and use O(1) when you don't.
C++ Names These Backwards
Every major language follows the same convention: plain name means hash-backed, tree/sorted means the variation. C++ looked at this convention and went the other direction. Not as a mistake. On purpose.
| Structure | C++ Name | Complexity |
|---|---|---|
| Hash set | unordered_set | O(1) avg |
| Hash map | unordered_map | O(1) avg |
| Ordered set | set | O(log n) |
| Ordered map | map | O(log n) |
If you write C++ and reach for set thinking you're getting a hash table, you just paid O(log n) for everything. Your code will run correctly. Just slower. And unordered_set was sitting right there the whole time. Know this before your interview and say it out loud when you reach for the right one.
Language Reference
| Language | Hash Set | Hash Map | Ordered Set | Ordered Map |
|---|---|---|---|---|
| Python | set | dict | (use bisect) | (use bisect) |
| Java | HashSet | HashMap | TreeSet | TreeMap |
| C++ | unordered_set | unordered_map | set | map |
| Go | map[K]struct{} | map[K]V | (no built-in) | (no built-in) |
| JavaScript | Set | Map | (no built-in) | (no built-in) |
| TypeScript | Set<K> | Map<K,V> | (no built-in) | (no built-in) |
Python has no built-in sorted set or sorted map. The bisect module on a sorted list covers most "ordered set" operations in interviews. The sortedcontainers library exists but isn't available on LeetCode without importing manually. Java and C++ are the cleanest choices when the problem needs ordered structures.
The Decision in Ten Seconds
Do I need to store anything alongside each element?
├── No → Set (membership, deduplication, set algebra)
└── Yes → Map (counting, grouping, associating data)
Do I need sorted order or range queries?
├── No → Hash-backed → O(1) average
└── Yes → Tree-backed → O(log n), ordered iteration
Build this before you touch the keyboard and you'll never backtrack because your set can't return an index, or your dict is doing work you didn't ask for.
If you want to practice explaining these choices out loud before you code, SpaceComplexity runs voice-based mock interviews where the rubric scores data structure reasoning as an explicit dimension. "I'm using a Map here because I need the index alongside each value" is worth points before you write a single line.
For a deeper look at how hash map time complexity works under the hood, and when the O(1) guarantee quietly breaks, that's worth reading before your next interview. And if you find yourself choosing between HashSet and TreeSet specifically, the HashSet vs TreeSet breakdown covers the ordered-vs-unordered tradeoff in more detail.
Further Reading
- Hash table, Wikipedia
- Red-black tree, Wikipedia, the backing structure for Java's TreeSet/TreeMap and C++'s
set/map - Python
settypes, python.org - Java TreeMap, docs.oracle.com
- C++
std::unordered_map, cppreference.com