The Greedy Algorithm: When Local Optimal Means Global Optimal

You are at a broken vending machine. It only gives change in the fewest coins possible. You type in $0.41, it spits back a quarter, a dime, a nickel, and a penny. Four coins. Optimal. It does this by always picking the largest denomination that fits the remaining amount. No backtracking. No planning ahead. Just pure, local greed, repeated until done.

That works for U.S. currency. Change the coin set to {1, 3, 4} and try to make 6. The greedy algorithm picks 4 first, then 1, then 1. Three coins. Optimal is two (3 + 3). The vending machine would be wrong.

This is the whole story of greedy algorithms: when the locally optimal choice at every step adds up to a globally optimal solution, and when it quietly doesn't. Understanding that line is what separates a candidate who guesses "maybe greedy?" from one who can prove it.

What Makes Greedy, Greedy

A greedy algorithm makes a sequence of choices. At each step, it picks the option that looks best right now, commits to it, and never looks back. No memoization. No exploring alternatives. No regret. It commits the way you commit to a restaurant order when the waiter is standing right there.

The mental model: imagine you are building a solution piece by piece, and at every junction you pick the locally optimal piece. The question is whether those pieces always fit together into a globally optimal whole.

Most of the time in an interview, greedy is either obviously correct (and you just have to prove it) or subtly wrong (and you need to catch it before you code it up). The danger zone is the middle case, where it feels correct, you code it, and the interviewer produces a counterexample. Politely. With that very specific calm that means they saw it coming.

The Two Properties That Make Greedy Work

Every provably correct greedy algorithm relies on two structural properties of the problem. Without both, you are guessing.

The Greedy-Choice Property

A globally optimal solution can always be constructed to include the greedy choice. This is not the same as saying the greedy choice is the only optimal first move. It means there exists an optimal solution that starts (or proceeds) the same way greedy does.

This property is what makes exchange arguments work: if you can always swap the greedy choice into any optimal solution without making it worse, the greedy choice is safe. In an interview, this is your justification. Without it, you are arguing by vibes, and vibes fail on edge cases.

Optimal Substructure

After making the greedy choice, the remaining subproblem has the same structure, and its optimal solution extends to a global optimum. This is shared with dynamic programming. The difference is that in greedy, you never need to evaluate more than one subproblem per step because the greedy choice always points you to the right one.

Without optimal substructure, solving the remainder optimally might not help because the earlier choice already damaged the global picture.

If both properties hold, greedy is correct. If either fails, you need DP or something heavier.

How Greedy Thinks: The Step-by-Step Mechanism

Input: problem with n items/choices
Sort or prioritize by some greedy criterion
current_state = initial state
solution = empty

for each candidate (in greedy order):
    if candidate is feasible given current_state:
        add candidate to solution
        update current_state

return solution

That loop is the entire algorithm for most greedy problems. The intelligence is in the sort criterion. Pick the right one and the loop produces an optimal solution. Pick the wrong one and it produces a plausible-looking wrong answer that passes your first three examples and fails on the fourth.

The diagram below shows the key structural insight: greedy is not exploring a tree of possibilities. It follows a single path.

Decision tree (what DP explores):
          start
         /     \
      A         B
     / \       / \
    C   D     E   F

Greedy (single path, committed choices):
start -> best local -> best local -> ... -> done

Greedy commits to one path while DP explores every branch of the decision tree Left: DP must evaluate every branch. Right: greedy follows exactly one path and never looks back.

Because greedy walks a single path, it is typically O(n log n) (dominated by the initial sort) while DP is O(n²) or O(n * capacity). That speed difference is real. A greedy solution is usually simple to code but requires genuine understanding to justify, which is exactly why interviewers reach for these problems.

Canonical Greedy Problems and Their Complexities

Problem	Greedy Criterion	Time	Space	Notes
Activity Selection (Interval Scheduling)	Earliest finish time	O(n log n)	O(n)	Optimal number of non-overlapping intervals
Minimize Maximum Lateness	Earliest deadline first	O(n log n)	O(n)	Exchange argument proof
Fractional Knapsack	Max value/weight ratio	O(n log n)	O(1)	Splitting items makes greedy valid
Huffman Coding	Merge two lowest-frequency nodes	O(n log n)	O(n)	Priority queue; optimal prefix code
Minimum Spanning Tree (Kruskal)	Min-weight edge not forming a cycle	O(E log E)	O(V)	Cut property justifies each addition
Minimum Spanning Tree (Prim)	Min-weight edge leaving current tree	O(E log V)	O(V)	Same cut property, different structure
Dijkstra's Shortest Path	Min-distance unvisited node	O((V+E) log V)	O(V)	Requires non-negative edge weights
Job Scheduling (maximize value)	This one needs DP	O(n²) or O(n log n)	O(n)	Greedy fails; weights break locality

The pattern in the "time" column: O(n log n) dominates because sorting is almost always the setup step. The actual selection loop is linear.

Proving Greedy Correct: Two Techniques

This is the part most candidates skip. Saying "greedy works because at each step we pick the best option" is not a proof. It is a restatement of the algorithm. You need to show why the locally optimal sequence produces a globally optimal result.

There are two standard proof techniques. Most greedy correctness arguments use one or the other.

Greedy Stays Ahead

The idea: define a measure of how good a partial solution is. Show that after every step, the greedy partial solution is at least as good as any other partial solution on that measure. If greedy is always "ahead" or tied, it cannot end up worse than optimal.

Interval scheduling is the classic example. The greedy algorithm: sort intervals by finish time, greedily select the earliest-finishing interval that does not conflict with the last selected.

Claim: for any k, the k-th interval selected by greedy finishes no later than the k-th interval in any other feasible solution.

Proof by induction. Base case: the first interval greedy picks is the one with the globally earliest finish time. No solution can start with a different interval that finishes earlier, by definition. So greedy is tied or ahead after step 1.

Inductive step: assume greedy's k-th choice finishes at time f_k, and the optimal solution's k-th choice finishes at some time f_k, and by hypothesis f_k <= fk. The (k+1)-th interval in the optimal solution starts at some time s*{k+1} >= f*_k >= f_k. So it is also compatible with greedy's solution at this point. But greedy picks the earliest-finishing compatible interval, so greedy's (k+1)-th choice finishes no later than optimal's (k+1)-th choice.

By induction, greedy always stays ahead. If greedy selects r intervals and optimal selects r intervals, then greedy's r-th interval finishes no later than optimal's r-th. If r < r, then optimal has an (r+1)-th interval starting after optimal's r-th end, which is after greedy's r-th end, meaning it does not conflict with anything greedy selected. But greedy would have selected it. Contradiction. So r >= r*, meaning greedy is optimal.**

The Exchange Argument

The idea: take any optimal solution. Show that you can repeatedly swap its choices for the greedy choices, one at a time, without making the solution worse. After all the swaps, you have the greedy solution, which is therefore also optimal.

Scheduling to minimize lateness is the textbook example. Jobs have deadlines d_i. Lateness of a job is max(0, finish_time - d_i). Goal: schedule all jobs to minimize the maximum lateness across all jobs.

Greedy: sort jobs by deadline, earliest deadline first.

Proof: suppose an optimal schedule has an inversion: job A comes before job B, but d_A > d_B (A has a later deadline but is scheduled first). Swapping A and B cannot increase the maximum lateness. Why? B now finishes earlier, so its lateness can only decrease or stay the same. A now finishes where B used to, and since d_A > d_B, A's lateness is no worse than B's was before the swap. So the swap does not increase maximum lateness.

Keep swapping inversions until there are none left. You now have the earliest-deadline-first schedule, and each swap preserved or improved the solution. Therefore, the greedy schedule is optimal.

For Huffman coding, the exchange argument shows that the two lowest-frequency characters must appear as siblings at the deepest level of any optimal code tree. If they are not, swapping them into those positions cannot increase the total code length, because placing higher-frequency characters deeper only makes things worse. This establishes the greedy choice: always merge the two minimum-frequency nodes.

The Canonical Greedy Problems in Depth

Activity Selection: Pick the Most Meetings

You have n meetings, each with a start and finish time. You want to attend as many as possible, with no overlap.

Meetings:  [1,4]  [2,5]  [3,6]  [5,7]  [6,9]  [8,10]

Sort by end:  [1,4]  [2,5]  [3,6]  [5,7]  [6,9]  [8,10]

Step 1: Pick [1,4]. last_end = 4.
Step 2: [2,5] starts at 2 < 4. Skip.
Step 3: [3,6] starts at 3 < 4. Skip.
Step 4: [5,7] starts at 5 >= 4. Pick it. last_end = 7.
Step 5: [6,9] starts at 6 < 7. Skip.
Step 6: [8,10] starts at 8 >= 7. Pick it. last_end = 10.

Result: [1,4], [5,7], [8,10], 3 meetings. Optimal.

Activity selection: sort by finish time, pick greedily. Selected intervals in blue, skipped in gray. Sort by finish time. Each selected interval (blue) leaves the most room for what comes next.

Why earliest finish time and not, say, earliest start time? Consider meetings [0,10], [1,2], [3,4]. Earliest start picks [0,10] first, blocking everything else. One meeting total. Earliest finish picks [1,2] first, leaving room for more. The finish time criterion is what "leaves the most room for future choices."

Fractional Knapsack: You Can Cut Items

Items have a weight and a value. Knapsack has a capacity. You want to maximize total value. The critical twist: you can take fractional items.

Sort by value/weight ratio. Take the highest-ratio item fully, then the next, until the bag is full. If the next item does not fit whole, take as much of it as fits.

This is optimal because at any capacity, the highest value-density item is the best use of that capacity. You are never blocked by indivisibility, so the greedy density argument holds exactly.

Remove the ability to split items (the 0/1 knapsack) and the argument breaks. A high-density item might block you from two medium-density items that together fit and beat it.

Huffman Coding: Shorter Codes for Frequent Letters

Given a set of characters and their frequencies, build a binary prefix code that minimizes total encoded length. Characters that appear often get short codes; rare ones get long codes.

Frequencies: A=45, B=13, C=12, D=16, E=9, F=5

Build a min-heap by frequency.
Merge F(5) and E(9) -> FE(14)
Merge B(13) and C(12) -> BC(25)
Merge D(16) and FE(14) -> DFE(30)
Merge BC(25) and DFE(30) -> BCDFE(55)
Merge BCDFE(55) and A(45) -> root(100)

A gets code 0 (1 bit)
B gets 100 (3 bits)
C gets 101 (3 bits)
D gets 110 (3 bits)
E gets 1110 (4 bits)
F gets 1111 (4 bits)

Total bits: 451 + 133 + 123 + 163 + 94 + 54 = 224. Optimal.

Huffman tree: merge two lowest-frequency nodes at each step. High-frequency characters get shorter codes. A gets a 1-bit code because it appears most often. F and E are rarest, so they sink to the deepest leaves with 4-bit codes.

The greedy choice (merge the two smallest) works because placing the rarest characters deepest minimizes their contribution to total length. The exchange argument shows no rearrangement can do better.

When the Greedy Algorithm Fails

The hardest part of greedy is not implementing it. It is knowing when not to use it. The sort and the scan are ten lines. The judgment call is the actual interview.

The Coin Change Trap

Make change for amount A using the fewest coins from set S. For standard denominations like {1, 5, 10, 25}, greedy (always pick the largest coin that fits) is provably optimal. These denominations are called "canonical."

For S = {1, 3, 4}, amount = 6:

Greedy: pick 4, then 1, then 1. Three coins.
Optimal: pick 3, then 3. Two coins.

Greedy fails because picking 4 leaves a remainder of 2, which 3 cannot divide into. The local choice (4 is the biggest coin under 6) created a subproblem {1, 3, 4} for amount 2 where no good solution exists. DP explores all remainders and finds the path that actually works. Greedy committed to 4 and paid for it.

The test: can a greedy choice "poison" the remainder? If yes, you need DP.

The 0/1 Knapsack Trap

Capacity = 50. Items: A(w=10, v=60), B(w=20, v=100), C(w=30, v=120).

Value/weight ratios: A=6.0, B=5.0, C=4.0.

Greedy: take A (v=60, w=10). Take B (v=100, w=20). Remaining capacity is 20. C weighs 30, so it does not fit. Total: 160.

Optimal: take B and C. Total: 220.

Greedy picked A first because it had the best ratio, but A's 10-unit contribution left a 20-unit gap that made C unreachable. In fractional knapsack, you would have taken 2/3 of C and hit 220. The indivisibility constraint is what breaks greedy.

Weighted Job Scheduling

Jobs have start, finish, and weight. Maximize total weight of non-overlapping selected jobs. Greedy by weight fails because a single heavy job might block several medium-weight jobs that together exceed it. There is no local criterion that reliably avoids this. You need DP with binary search on start times, for O(n log n).

The Three-Question Greedy Diagnostic

Before you write the sort, run this checklist:

Can making the locally best choice now block a globally better outcome later? (Greedy fails if yes)
After the greedy choice, is the remaining subproblem identical in structure? (Greedy needs this)
Can you sketch a proof, even an informal one? Greedy without a proof is a hunch. Hunches don't get rubric points.

What Problems Greedy Solves

Greedy is the right tool for:

Scheduling and interval problems: maximize the number of non-overlapping events, minimize lateness, assign tasks to machines
Minimum spanning trees: Kruskal and Prim, both justified by the cut property (any edge that is the lightest crossing some partition is in some MST)
Shortest paths with non-negative weights: Dijkstra's is a greedy algorithm; it always commits to the closest unvisited node
Prefix coding: Huffman; any time you are building optimal variable-length codes from frequencies
Fractional selection: any problem where you can take partial items, greedy by density is often optimal
Greedy on sorted input: many interview problems become obvious once you sort by the right criterion and scan once

Recognizing the Pattern in Interviews

The signals that a problem might be greedy:

The word "minimum" or "maximum" appears in the problem statement
You are selecting from a set and the order of selection matters
The problem is over intervals, deadlines, or sorted sequences
A small example that you solve by hand reveals an obvious sorting criterion
Brute force requires exploring exponentially many subsets

The signals that greedy might fail:

Items or choices cannot be split (indivisible units)
One choice blocks future choices in a non-obvious way
Weights, values, or profits are heterogeneous (not uniform)
Small counterexamples break the obvious greedy strategy

The most reliable interview heuristic: sort by the natural criterion, trace through two or three examples by hand, and look for a counterexample before you code.

Worked Example 1: Jump Game (LeetCode 55)

Problem: given an array where each element is the maximum jump length from that position, determine if you can reach the last index from index 0.

Greedy works here. At each position, track the farthest index reachable from anywhere up to the current position. If you ever find that the current index exceeds your farthest reach, you are stuck.

def can_jump(nums: list[int]) -> bool:
    farthest = 0
    for i, jump in enumerate(nums):
        if i > farthest:
            return False
        farthest = max(farthest, i + jump)
    return True

Why greedy works: at each step, maintaining the maximum reachable index is the only thing that matters. Any position behind it is already reachable. Any position beyond it is not. There is no "saving jumps" for later. The greedy choice (update max reach) is exactly right, and you never need to revisit it.

This satisfies both properties: the greedy choice (track farthest) is safe, and the subproblem after each step has the same structure.

Worked Example 2: Non-overlapping Intervals (LeetCode 435)

Problem: given a list of intervals, find the minimum number of intervals to remove to make the rest non-overlapping.

This is the interval scheduling problem flipped. Instead of maximizing selected intervals, you minimize removed ones. Maximum non-overlapping subset size is n minus minimum removals, so the same greedy applies: sort by end time, greedily keep the earliest-finishing non-overlapping interval.

def erase_overlap_intervals(intervals: list[list[int]]) -> int:
    if not intervals:
        return 0
    intervals.sort(key=lambda x: x[1])
    kept = 1
    last_end = intervals[0][1]
    for start, end in intervals[1:]:
        if start >= last_end:
            kept += 1
            last_end = end
    return len(intervals) - kept

The greedy-stays-ahead proof applies directly: earliest finish time always leaves the most room, so the maximum compatible set is achieved by this criterion.

One Algorithm, Every Language

Activity selection (interval scheduling): given intervals as [start, end] pairs, return the maximum set of non-overlapping intervals. Sort by end time, scan once.

def activity_selection(intervals: list[tuple[int, int]]) -> list[tuple[int, int]]:
    intervals.sort(key=lambda x: x[1])
    selected: list[tuple[int, int]] = []
    last_end = -1
    for start, end in intervals:
        if start >= last_end:
            selected.append((start, end))
            last_end = end
    return selected

Greedy vs. DP: The Decision in Practice

When you see an optimization problem in an interview, the rough mental decision tree is:

Is there a natural greedy criterion (sort by X)?
    Yes -> Does a greedy choice ever "poison" future choices?
        No  -> Try greedy. Find the proof.
        Yes -> Use DP.
    No  -> Almost certainly DP or search.

The clearest signal that greedy will fail: the problem involves weights, values, or profits that interact in non-uniform ways. Uniform problems (every job has the same weight, every item has the same value) are often greedy. Heterogeneous problems usually need DP.

Activity selection is greedy. Weighted activity selection is DP. That single word "weighted" flips the entire approach. If you miss it in an interview, you will spend 20 minutes implementing a wrong greedy, get stuck when the interviewer mentions a counterexample, and spend the remaining time trying to patch a solution that was never going to work.

Greedy for DP-requiring problems is one of the top mistakes engineers make in coding interviews. If you want to practice catching that distinction in real time, SpaceComplexity runs mock interviews where an AI interviewer will push back exactly when your greedy intuition goes wrong, with rubric-based feedback on your justification, not just your code.

Recap

Greedy algorithms make the locally optimal choice at each step and never reconsider.
Two required properties: the greedy-choice property (a global optimum includes the greedy choice) and optimal substructure (solving the remaining subproblem optimally extends to a global optimum).
Greedy stays ahead: prove that after every step, the greedy partial solution is at least as good as any other on some measure. Interval scheduling uses this.
Exchange argument: show that any optimal solution can be transformed into the greedy solution by swapping choices without losing quality. Minimize-lateness and Huffman use this.
Greedy fails when a local choice can block a globally better path: 0/1 knapsack, non-canonical coin change, weighted job scheduling all require DP.
The diagnostic: sort by the natural criterion, trace a few examples, look for a counterexample. If you find one, reach for DP.
In interviews, the word "weighted" added to any greedy-solvable problem usually turns it into a DP problem.

If you want to sharpen your understanding of how this connects to DP design, the dynamic programming framework post covers the memoization and tabulation patterns that pick up where greedy leaves off.

For interval-style problems, sliding window covers a related technique that scans sorted or sequential data in O(n) without the greedy commitment step.