Top 15 LeetCode Problems for Machine Learning Engineer Interviews

You trained a model on 500 million tokens. You shipped a feature that serves a hundred million users. Your embeddings are tighter than your interviewer's schedule. And now Google wants you to invert a binary tree.

Welcome to the ML engineer coding interview. It's the same bar as a SWE interview. Same rubric, same complexity expectations, same "can you walk me through your reasoning" while you silently panic. The only difference is which problems show up.

And that distribution? It's predictable once you see the pattern.

The Bar Is the Same. The Problem Mix Isn't.

At Google, Meta, and Amazon, ML engineer coding rounds use the same rubric as SWE rounds. You identify the pattern fast, communicate your reasoning, and write clean code. What differs is the tilt.

ML engineer interviews weight streaming, matrix, graph, and probabilistic problems more heavily than general SWE interviews. A lot of what you build in the role involves data pipelines, nearest-neighbor search, graph-structured recommendations, and sampling from distributions. The problems reflect that. There is no easier track. No one is handing you a NumPy question and calling it a day.

The gap isn't difficulty. It's distribution. Here are the 15 problems that show up.

Heaps and Priority Queues: Streaming Statistics and Nearest Neighbors

Heap problems appear more in ML engineer interviews than in SWE loops because streaming statistics are a core building block of production ML systems. See the heap data structure guide if you need to build the mental model before drilling problems.

1. Find Median from Data Stream (LC 295)

The single most common ML-adjacent heap problem. It shows up because tracking running statistics (training loss, feature drift, evaluation scores) is everywhere in deployed ML.

The trick is two heaps. A max-heap holds the lower half. A min-heap holds the upper half. Every insertion goes into the max-heap first, then you immediately move the max-heap's top into the min-heap. If the min-heap grows larger, you move its top back.

import heapq

class MedianFinder:
    def __init__(self):
        self.lo = []   # max-heap (negate values)
        self.hi = []   # min-heap

    def addNum(self, num: int) -> None:
        heapq.heappush(self.lo, -num)
        heapq.heappush(self.hi, -heapq.heappop(self.lo))
        if len(self.hi) > len(self.lo):
            heapq.heappush(self.lo, -heapq.heappop(self.hi))

    def findMedian(self) -> float:
        if len(self.lo) > len(self.hi):
            return -self.lo[0]
        return (-self.lo[0] + self.hi[0]) / 2.0

Push to max-heap, immediately push the max to min-heap. One conditional rebalance. Most candidates fumble the rebalance direction. Don't be most candidates.

2. K Closest Points to Origin (LC 973)

Distance to origin is the computation behind every k-nearest-neighbor query. You don't need square roots. Distance squared preserves ordering, and skipping the sqrt is the kind of small insight interviewers remember.

Use a max-heap of size k. For each point, if its squared distance is smaller than the heap's current maximum, pop and push. The heap contains the k closest points when you're done. O(n log k), which is exactly what you'd want in production too.

3. Merge K Sorted Lists (LC 23)

Ranking systems and retrieval pipelines merge sorted candidate lists from multiple sources. Each source produces a ranked list. You need the globally merged result efficiently.

Push (node.val, i, node) for each list head into a min-heap. Pop the minimum, append it to the result, push that node's next child. O(N log k). The i tiebreaker prevents comparisons between ListNode objects when values are equal, which Python will happily crash on.

Matrix and Arrays: The Tensor Operations You Actually Encounter

4. Rotate Image (LC 48)

Rotate an n x n matrix 90 degrees clockwise in-place. Transpose first (swap matrix[i][j] and matrix[j][i]), then reverse each row. Two linear passes, zero extra space.

This tests whether you can reason about 2D index transformations. That skill matters whenever you work with spatial data, attention masks, or feature map manipulations. It also just feels satisfying to solve cleanly.

5. Product of Array Except Self (LC 238)

No division allowed. That constraint forces the prefix-suffix trick. First pass: build a prefix product array where prefix[i] holds the product of all elements before index i. Second pass: multiply each position by the running suffix product from the right.

O(n) time, O(1) extra space. This problem appears regularly at Google and Meta. The division ban is a deliberate probe. Candidates who derive the prefix-suffix approach independently give a much stronger signal than those who ask about division first, discover they can't use it, and then stare at the screen.

6. Subarray Sum Equals K (LC 560)

Count subarrays whose sum equals k. The O(n) solution combines prefix sums with a hash map. At each index, you know the prefix sum so far. If prefix - k is already in the map, every earlier prefix that produces that value gives a valid subarray ending here.

def subarraySum(nums: list[int], k: int) -> int:
    count, prefix, seen = 0, 0, {0: 1}
    for n in nums:
        prefix += n
        count += seen.get(prefix - k, 0)
        seen[prefix] = seen.get(prefix, 0) + 1
    return count

This pattern extends directly to feature window aggregation in time-series preprocessing. The {0: 1} initialization is the kind of detail that separates the "I kind of remember this" from the "I've actually solved it" answers.

Sliding Window and Strings: NLP Preprocessing Patterns

These two problems appear in ML engineer loops because both map to real text-processing tasks. For the full sliding window template, see the sliding window algorithm guide.

7. Minimum Window Substring (LC 76)

Find the shortest substring of s containing all characters of t. Hard difficulty, but Amazon asks it regularly for ML engineer roles.

Two pointers expand the right boundary until the window contains all required characters, then shrink from the left to minimize length. Track a need counter that decrements when a required character count is satisfied. When need reaches zero, the window is valid. O(|s| + |t|). One of those problems where understanding the need counter makes everything click.

8. Word Break (LC 139)

Given a string and a dictionary, determine if the string can be segmented into dictionary words. This is a direct model of tokenization: can you decompose a sequence into known subunits?

dp[i] is True if s[:i] is segmentable. For each position i, check all j less than i: if dp[j] is True and s[j:i] is in the word set, set dp[i] to True. O(n^2) with a set lookup. The connection to byte-pair encoding is direct. This problem shows up in Google and Amazon ML rounds, and if you work on NLP, you should feel slightly embarrassed not knowing it cold.

Trees: Computation Graphs and Model Serialization

9. Binary Tree Maximum Path Sum (LC 124)

The most commonly reported tree problem in Meta and Amazon ML engineer interviews. At each node, the maximum path contribution through that node is node.val + max(0, left_gain) + max(0, right_gain). You update a global maximum with that value but return only node.val + max(left_gain, right_gain, 0) upward because a path cannot fork.

def maxPathSum(root) -> int:
    res = [root.val]

    def dfs(node):
        if not node:
            return 0
        left = max(dfs(node.left), 0)
        right = max(dfs(node.right), 0)
        res[0] = max(res[0], node.val + left + right)
        return node.val + max(left, right)

    dfs(root)
    return res[0]

Clamping to zero is the load-bearing line. Without it, you'd include negative subtrees and break the path. A lot of candidates get the structure right and lose the clamp.

10. Serialize and Deserialize Binary Tree (LC 297)

Convert a binary tree to a string and back. Use preorder traversal with '#' as a null marker. During deserialization, a deque of tokens lets you pop from the front in O(1). This problem directly models how decision trees and gradient-boosted ensembles are serialized to disk. Which is funny, because you probably use joblib.dump in practice and never think about it.

Graphs: Pipelines, Recommendations, and Spatial Data

11. Course Schedule II (LC 210)

Return a valid course ordering given prerequisite pairs. This is Kahn's algorithm: build an in-degree map, push all zero-in-degree nodes to a queue, process each and decrement neighbors' in-degrees. If the output length equals n, no cycle exists.

This pattern is identical to resolving a pipeline dependency graph, which appears in every ML training orchestration system. The topological sort guide covers both Kahn's and DFS approaches with full implementations.

12. Number of Islands (LC 200)

Count connected components of '1' cells in a binary grid. BFS or DFS from each unvisited land cell, marking visited as you go. The outer loop is non-optional: you must iterate over every cell and start a new traversal whenever you find an unvisited '1'. Missing the outer loop is the most common failure mode. It's also the kind of bug that passes 80% of test cases and fails in production.

The pattern maps directly to connected component labeling in image segmentation, a foundational operation in computer vision pipelines.

Probability and Sampling: The Problems Most Candidates Skip

Here's the actual gap. These appear far more in ML engineer loops than in general SWE rounds. Most candidates have done zero reservoir sampling practice because they assume it's too niche. It isn't niche. It's the thing Google uses to decide if you understand the probabilistic foundations of your own field.

13. Linked List Random Node (LC 382)

Return a uniformly random node from a linked list of unknown length without building an auxiliary array. The O(1) space solution is reservoir sampling: scan the list and for the i-th node, keep it with probability 1/i, replacing whatever you currently hold.

import random

def getRandom(head) -> int:
    result, node, i = head.val, head.next, 2
    while node:
        if random.randint(1, i) == 1:
            result = node.val
        node, i = node.next, i + 1
    return result

Reservoir sampling is fundamental to ML data collection, especially when sampling from datasets too large to hold in memory. Google asks this in ML rounds specifically because the correctness proof generalizes to importance sampling and online learning scenarios. If you can explain why this gives uniform probability, you've immediately separated yourself from the candidates who just memorized the pattern.

14. Random Pick with Weight (LC 528)

Pick an index randomly with probability proportional to a given weight array. Build a prefix sum array, draw a random float in [0, total_weight), then binary search for the bucket it falls in.

This maps to weighted random sampling used in curriculum learning, prioritized experience replay in reinforcement learning, and mini-batch construction with class reweighting. If you've ever tuned a loss function, you've implicitly used this mechanism. Might as well know how it works.

Frequency Counting: The Feature Selection Signal

15. Top K Frequent Elements (LC 347)

Return the k most frequent elements. The heap solution runs in O(n log k). The bucket sort solution runs in O(n): create frequency buckets of length n+1 (since max frequency is n), place each element in its bucket, then scan right to left until you have k elements.

This is the frequency-counting template that appears in feature selection, vocabulary construction, and label imbalance analysis. Meta interviewers watch whether you default to the heap or recognize the bucket sort optimization. The bucket sort insight is the difference between a pass and a strong hire signal.

Reference Table

Pattern	Problem	LC #	Difficulty	ML Connection
Heap	Find Median from Data Stream	295	Hard	Streaming statistics, feature drift
Heap	K Closest Points to Origin	973	Medium	k-NN inference
Heap	Merge K Sorted Lists	23	Hard	Ranking merge, retrieval pipelines
Matrix	Rotate Image	48	Medium	Tensor transformation
Arrays	Product of Array Except Self	238	Medium	Leave-one-out computation
Arrays	Subarray Sum Equals K	560	Medium	Feature window aggregation
Sliding window	Minimum Window Substring	76	Hard	NLP pattern matching
DP / strings	Word Break	139	Medium	Tokenization, BPE
Trees	Binary Tree Maximum Path Sum	124	Hard	Computation graph traversal
Trees	Serialize and Deserialize	297	Hard	Model serialization
Graphs	Course Schedule II	210	Medium	Pipeline DAG resolution
Graphs	Number of Islands	200	Medium	Connected component labeling
Sampling	Linked List Random Node	382	Medium	Reservoir sampling
Sampling	Random Pick with Weight	528	Medium	Weighted dataset sampling
Frequency	Top K Frequent Elements	347	Medium	Feature selection, vocab construction

What to Focus On

Heaps, trees, and arrays are the highest-frequency categories. If prep time is short, start there.
LC 382 and LC 528 appear far more in MLE loops than in generalist SWE rounds. Most candidates skip them entirely. That's your edge.
Knowing the ML connection behind each problem matters during the interview. Interviewers in ML roles notice when you recognize why a problem is relevant to the work. It's not just a talking point. It's a signal.
The coding screen is only half the signal. The other half is how you narrate your reasoning live under pressure. SpaceComplexity runs voice-based mock interviews with rubric scoring across communication, problem-solving, code quality, and optimization. Practice the narration, not just the solution. Especially the reservoir sampling explanation. That one is hard to wing.