Top 15 LeetCode Problems for Machine Learning Engineer Interviews

- Machine learning engineer interview coding rounds use the same rubric as SWE rounds but tilt toward streaming, matrix, and probabilistic problems that mirror production ML work
- Two-heap pattern (LC 295) is the most common ML-adjacent heap problem, tracking running statistics like training loss and feature drift in O(log n) per insertion
- Reservoir sampling (LC 382) and weighted random sampling (LC 528) appear far more in MLE rounds than SWE rounds; most candidates skip both and pay for it
- Word Break (LC 139) directly models tokenization and byte-pair encoding, making it a frequent Google and Amazon MLE round pick
- Topological sort (LC 210) maps exactly to pipeline DAG resolution in ML training orchestration, the same algorithm Airflow and Kubeflow rely on
- Bucket sort variant of Top K Frequent Elements (LC 347) drops complexity to O(n) and is the signal Meta interviewers watch for in frequency-counting problems
You trained a model on 500 million tokens. You shipped a feature that serves a hundred million users. Your embeddings are tighter than your interviewer's schedule. And now Google wants you to invert a binary tree.
Welcome to the ML engineer coding interview. It's the same bar as a SWE interview. Same rubric, same complexity expectations, same "can you walk me through your reasoning" while you silently panic. The only difference is which problems show up.
And that distribution? It's predictable once you see the pattern.
The Bar Is the Same. The Problem Mix Isn't.
At Google, Meta, and Amazon, ML engineer coding rounds use the same rubric as SWE rounds. You identify the pattern fast, communicate your reasoning, and write clean code. What differs is the tilt.
ML engineer interviews weight streaming, matrix, graph, and probabilistic problems more heavily than general SWE interviews. A lot of what you build in the role involves data pipelines, nearest-neighbor search, graph-structured recommendations, and sampling from distributions. The problems reflect that. There is no easier track. No one is handing you a NumPy question and calling it a day.
The gap isn't difficulty. It's distribution. Here are the 15 problems that show up.
Heaps and Priority Queues: Streaming Statistics and Nearest Neighbors
Heap problems appear more in ML engineer interviews than in SWE loops because streaming statistics are a core building block of production ML systems. See the heap data structure guide if you need to build the mental model before drilling problems.
1. Find Median from Data Stream (LC 295)
The single most common ML-adjacent heap problem. It shows up because tracking running statistics (training loss, feature drift, evaluation scores) is everywhere in deployed ML.
The trick is two heaps. A max-heap holds the lower half. A min-heap holds the upper half. Every insertion goes into the max-heap first, then you immediately move the max-heap's top into the min-heap. If the min-heap grows larger, you move its top back.
import heapq class MedianFinder: def __init__(self): self.lo = [] # max-heap (negate values) self.hi = [] # min-heap def addNum(self, num: int) -> None: heapq.heappush(self.lo, -num) heapq.heappush(self.hi, -heapq.heappop(self.lo)) if len(self.hi) > len(self.lo): heapq.heappush(self.lo, -heapq.heappop(self.hi)) def findMedian(self) -> float: if len(self.lo) > len(self.hi): return -self.lo[0] return (-self.lo[0] + self.hi[0]) / 2.0
Push to max-heap, immediately push the max to min-heap. One conditional rebalance. Most candidates fumble the rebalance direction. Don't be most candidates.
2. K Closest Points to Origin (LC 973)
Distance to origin is the computation behind every k-nearest-neighbor query. You don't need square roots. Distance squared preserves ordering, and skipping the sqrt is the kind of small insight interviewers remember.
Use a max-heap of size k. For each point, if its squared distance is smaller than the heap's current maximum, pop and push. The heap contains the k closest points when you're done. O(n log k), which is exactly what you'd want in production too.
3. Merge K Sorted Lists (LC 23)
Ranking systems and retrieval pipelines merge sorted candidate lists from multiple sources. Each source produces a ranked list. You need the globally merged result efficiently.
Push (node.val, i, node) for each list head into a min-heap. Pop the minimum, append it to the result, push that node's next child. O(N log k). The i tiebreaker prevents comparisons between ListNode objects when values are equal, which Python will happily crash on.
Matrix and Arrays: The Tensor Operations You Actually Encounter
4. Rotate Image (LC 48)
Rotate an n x n matrix 90 degrees clockwise in-place. Transpose first (swap matrix[i][j] and matrix[j][i]), then reverse each row. Two linear passes, zero extra space.
This tests whether you can reason about 2D index transformations. That skill matters whenever you work with spatial data, attention masks, or feature map manipulations. It also just feels satisfying to solve cleanly.
5. Product of Array Except Self (LC 238)
No division allowed. That constraint forces the prefix-suffix trick. First pass: build a prefix product array where prefix[i] holds the product of all elements before index i. Second pass: multiply each position by the running suffix product from the right.
O(n) time, O(1) extra space. This problem appears regularly at Google and Meta. The division ban is a deliberate probe. Candidates who derive the prefix-suffix approach independently give a much stronger signal than those who ask about division first, discover they can't use it, and then stare at the screen.
6. Subarray Sum Equals K (LC 560)
Count subarrays whose sum equals k. The O(n) solution combines prefix sums with a hash map. At each index, you know the prefix sum so far. If prefix - k is already in the map, every earlier prefix that produces that value gives a valid subarray ending here.
def subarraySum(nums: list[int], k: int) -> int: count, prefix, seen = 0, 0, {0: 1} for n in nums: prefix += n count += seen.get(prefix - k, 0) seen[prefix] = seen.get(prefix, 0) + 1 return count
This pattern extends directly to feature window aggregation in time-series preprocessing. The {0: 1} initialization is the kind of detail that separates the "I kind of remember this" from the "I've actually solved it" answers.
Sliding Window and Strings: NLP Preprocessing Patterns
These two problems appear in ML engineer loops because both map to real text-processing tasks. For the full sliding window template, see the sliding window algorithm guide.
7. Minimum Window Substring (LC 76)
Find the shortest substring of s containing all characters of t. Hard difficulty, but Amazon asks it regularly for ML engineer roles.
Two pointers expand the right boundary until the window contains all required characters, then shrink from the left to minimize length. Track a need counter that decrements when a required character count is satisfied. When need reaches zero, the window is valid. O(|s| + |t|). One of those problems where understanding the need counter makes everything click.
8. Word Break (LC 139)
Given a string and a dictionary, determine if the string can be segmented into dictionary words. This is a direct model of tokenization: can you decompose a sequence into known subunits?
dp[i] is True if s[:i] is segmentable. For each position i, check all j less than i: if dp[j] is True and s[j:i] is in the word set, set dp[i] to True. O(n^2) with a set lookup. The connection to byte-pair encoding is direct. This problem shows up in Google and Amazon ML rounds, and if you work on NLP, you should feel slightly embarrassed not knowing it cold.
Trees: Computation Graphs and Model Serialization
9. Binary Tree Maximum Path Sum (LC 124)
The most commonly reported tree problem in Meta and Amazon ML engineer interviews. At each node, the maximum path contribution through that node is node.val + max(0, left_gain) + max(0, right_gain). You update a global maximum with that value but return only node.val + max(left_gain, right_gain, 0) upward because a path cannot fork.
def maxPathSum(root) -> int: res = [root.val] def dfs(node): if not node: return 0 left = max(dfs(node.left), 0) right = max(dfs(node.right), 0) res[0] = max(res[0], node.val + left + right) return node.val + max(left, right) dfs(root) return res[0]
Clamping to zero is the load-bearing line. Without it, you'd include negative subtrees and break the path. A lot of candidates get the structure right and lose the clamp.
10. Serialize and Deserialize Binary Tree (LC 297)
Convert a binary tree to a string and back. Use preorder traversal with '#' as a null marker. During deserialization, a deque of tokens lets you pop from the front in O(1). This problem directly models how decision trees and gradient-boosted ensembles are serialized to disk. Which is funny, because you probably use joblib.dump in practice and never think about it.
Graphs: Pipelines, Recommendations, and Spatial Data
11. Course Schedule II (LC 210)
Return a valid course ordering given prerequisite pairs. This is Kahn's algorithm: build an in-degree map, push all zero-in-degree nodes to a queue, process each and decrement neighbors' in-degrees. If the output length equals n, no cycle exists.
This pattern is identical to resolving a pipeline dependency graph, which appears in every ML training orchestration system. The topological sort guide covers both Kahn's and DFS approaches with full implementations.
12. Number of Islands (LC 200)
Count connected components of '1' cells in a binary grid. BFS or DFS from each unvisited land cell, marking visited as you go. The outer loop is non-optional: you must iterate over every cell and start a new traversal whenever you find an unvisited '1'. Missing the outer loop is the most common failure mode. It's also the kind of bug that passes 80% of test cases and fails in production.
The pattern maps directly to connected component labeling in image segmentation, a foundational operation in computer vision pipelines.
Probability and Sampling: The Problems Most Candidates Skip
Here's the actual gap. These appear far more in ML engineer loops than in general SWE rounds. Most candidates have done zero reservoir sampling practice because they assume it's too niche. It isn't niche. It's the thing Google uses to decide if you understand the probabilistic foundations of your own field.
13. Linked List Random Node (LC 382)
Return a uniformly random node from a linked list of unknown length without building an auxiliary array. The O(1) space solution is reservoir sampling: scan the list and for the i-th node, keep it with probability 1/i, replacing whatever you currently hold.
import random def getRandom(head) -> int: result, node, i = head.val, head.next, 2 while node: if random.randint(1, i) == 1: result = node.val node, i = node.next, i + 1 return result
Reservoir sampling is fundamental to ML data collection, especially when sampling from datasets too large to hold in memory. Google asks this in ML rounds specifically because the correctness proof generalizes to importance sampling and online learning scenarios. If you can explain why this gives uniform probability, you've immediately separated yourself from the candidates who just memorized the pattern.
14. Random Pick with Weight (LC 528)
Pick an index randomly with probability proportional to a given weight array. Build a prefix sum array, draw a random float in [0, total_weight), then binary search for the bucket it falls in.
This maps to weighted random sampling used in curriculum learning, prioritized experience replay in reinforcement learning, and mini-batch construction with class reweighting. If you've ever tuned a loss function, you've implicitly used this mechanism. Might as well know how it works.
Frequency Counting: The Feature Selection Signal
15. Top K Frequent Elements (LC 347)
Return the k most frequent elements. The heap solution runs in O(n log k). The bucket sort solution runs in O(n): create frequency buckets of length n+1 (since max frequency is n), place each element in its bucket, then scan right to left until you have k elements.
This is the frequency-counting template that appears in feature selection, vocabulary construction, and label imbalance analysis. Meta interviewers watch whether you default to the heap or recognize the bucket sort optimization. The bucket sort insight is the difference between a pass and a strong hire signal.
Reference Table
| Pattern | Problem | LC # | Difficulty | ML Connection |
|---|---|---|---|---|
| Heap | Find Median from Data Stream | 295 | Hard | Streaming statistics, feature drift |
| Heap | K Closest Points to Origin | 973 | Medium | k-NN inference |
| Heap | Merge K Sorted Lists | 23 | Hard | Ranking merge, retrieval pipelines |
| Matrix | Rotate Image | 48 | Medium | Tensor transformation |
| Arrays | Product of Array Except Self | 238 | Medium | Leave-one-out computation |
| Arrays | Subarray Sum Equals K | 560 | Medium | Feature window aggregation |
| Sliding window | Minimum Window Substring | 76 | Hard | NLP pattern matching |
| DP / strings | Word Break | 139 | Medium | Tokenization, BPE |
| Trees | Binary Tree Maximum Path Sum | 124 | Hard | Computation graph traversal |
| Trees | Serialize and Deserialize | 297 | Hard | Model serialization |
| Graphs | Course Schedule II | 210 | Medium | Pipeline DAG resolution |
| Graphs | Number of Islands | 200 | Medium | Connected component labeling |
| Sampling | Linked List Random Node | 382 | Medium | Reservoir sampling |
| Sampling | Random Pick with Weight | 528 | Medium | Weighted dataset sampling |
| Frequency | Top K Frequent Elements | 347 | Medium | Feature selection, vocab construction |
What to Focus On
- Heaps, trees, and arrays are the highest-frequency categories. If prep time is short, start there.
- LC 382 and LC 528 appear far more in MLE loops than in generalist SWE rounds. Most candidates skip them entirely. That's your edge.
- Knowing the ML connection behind each problem matters during the interview. Interviewers in ML roles notice when you recognize why a problem is relevant to the work. It's not just a talking point. It's a signal.
- The coding screen is only half the signal. The other half is how you narrate your reasoning live under pressure. SpaceComplexity runs voice-based mock interviews with rubric scoring across communication, problem-solving, code quality, and optimization. Practice the narration, not just the solution. Especially the reservoir sampling explanation. That one is hard to wing.