Code Review Interview: You're Hunting Bugs. That's the Trap.

You get a piece of code. You have 45 minutes. Your instinct kicks in: start at line one, hunt bugs, compile a list, present it confidently.

That's the wrong game.

The code review interview isn't a bug-finding contest. It's a judgment test. What you look at first, what you skip, what you push back on when challenged, how you explain a suggestion to someone who wrote the code and definitely doesn't love hearing it. That's what interviewers are scoring. Candidates who treat it as a scavenger hunt miss the point entirely, and usually miss the actual SQL injection too.

Who Uses This Round

Code review interviews show up most at Google for Engineering Manager roles, where candidates can choose between a coding round and a code review. Many say the code review is the easier pick. They're right. But only if you understand what's being tested.

The format is spreading. Companies at SDE 2 and SDE 3 levels increasingly use it as an alternative to algorithmic problems, particularly for senior engineers where "can you implement Dijkstra from memory?" is weak signal. Some companies run a hybrid: you write code for 40 minutes, then spend 35 minutes reviewing it with the interviewer treating it as someone else's PR. That's harder than it sounds. Reviewing your own fresh code without defensively explaining every decision takes restraint most people don't have.

The round runs 45 minutes: 5 minutes of context-setting, 25 minutes of review, 10-15 of discussion. The discussion is where the real interview happens.

What the Code Actually Looks Like

You'll receive a realistic snippet of 50-200 lines. It won't be a toy problem. It'll look like something a junior engineer pushed on a Friday afternoon, three minutes before standup: mostly working, a few good ideas, and several problems a careful reviewer would catch.

Here's a condensed version:

import sqlite3

def get_user(user_id):
    conn = sqlite3.connect("users.db")
    cursor = conn.cursor()
    # Direct string interpolation
    query = f"SELECT * FROM users WHERE id = {user_id}"
    cursor.execute(query)
    return cursor.fetchone()
    # Connection never closed

def update_user_balance(user_id, amount):
    conn = sqlite3.connect("users.db")
    cursor = conn.cursor()
    cursor.execute("SELECT balance FROM users WHERE id = ?", (user_id,))
    current = cursor.fetchone()[0]   # Crashes if user not found
    new_balance = current + amount
    cursor.execute(
        "UPDATE users SET balance = ? WHERE id = ?",
        (new_balance, user_id)
    )
    conn.commit()
    conn.close()

def get_all_users():
    conn = sqlite3.connect("users.db")
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM users")
    return cursor.fetchall()         # Unbounded, no pagination

This code works on the happy path. It fails in interesting ways everywhere else. Someone pushed it. It's in production. The user count is growing.

What's Actually Being Scored

Companies that use this format score roughly four things:

Technical detection (40%): Did you find the issues that matter? Missing a critical security bug is a hard fail. Spending five minutes on indentation is a soft fail of a different kind.

Prioritization (25%): Which issue did you lead with? Interviewers pay close attention to what you bring up first. "I noticed a SQL injection vulnerability in get_user" is a strong opening. "The variable names could be more descriptive" is not.

Communication (25%): Can you explain why something is a problem in a way that would actually change behavior? Not condescending, not vague. Specific, reasoned, concrete. This is the same signal interviewers look for when you explain your thinking out loud during a coding round.

Process (10%): Did you ask clarifying questions before diving in? Did you read the whole thing before commenting? Did you acknowledge what was working?

Technical and prioritization together are 65% of your score. Style and naming together are maybe 5%. Every minute you spend on formatting is a minute the interviewer is quietly updating their mental model of your priorities.

The Three-Pass Framework

Do three passes. Fast, then careful, then deep.

First pass (3-5 minutes): Read everything top to bottom without stopping. The goal is context. What is this code supposed to do? What data flows through it? Asking "does this handle production traffic or internal tooling?" before you leave a comment shows maturity. It's the same impulse that makes clarifying questions valuable in a coding round. A SQL injection in an internal admin tool still gets fixed, just not at 2 AM.

Second pass (15-20 minutes): Go line by line and log every observation. Don't prioritize yet. Just capture. SQL injection, missing null check, resource leak, unbounded query. Write them all down. Follow the data. What happens if fetchone() returns None? Trace execution mentally. Narrate as you go: silence during a review is the same weak signal as silence during a coding interview.

Third pass (5-10 minutes): Organize by severity and decide what to say out loud first. Cut style comments to one or two at most. Your goal is to leave the interviewer with the impression that you found the things that mattered and understood why they mattered. Not that you have opinions about tabs versus spaces.

The Bug Priority Stack

Not all bugs are equal. Interviewers intentionally plant issues at different severity levels to test your sorting judgment.

Bug severity priority stack: from must-catch security issues at top to low-priority style at bottom

Must catch. SQL injection, race conditions, authentication bypasses, null dereferences, off-by-one errors that corrupt data. In the snippet above, the f-string in get_user is direct SQL injection. Any input like 1 OR 1=1 returns every row. This is the top of the stack.

Should catch. The update_user_balance function has a classic read-modify-write race condition. Two concurrent requests read balance = 100, both add 50, both write 150. One increment is lost. This is a real production bug that only shows up under load, which is exactly what experienced engineers learn to spot.

Good to catch. The connection is never closed in get_user. get_all_users loads every row into memory with no limit. Real problems, but not going to page anyone at 3 AM.

Low priority. Naming, formatting, docstrings. Mention one to show you're well-rounded. Spend more than two minutes here and you've signaled that you don't know the difference between cosmetic and critical.

The question "if you could only leave three comments on this PR, which would they be?" is not hypothetical. Interviewers ask it, and it's a clean filter. Candidates who answer with a security bug, a race condition, and a null dereference understand software risk. Candidates who answer with variable names and a docstring are the people who write "nit: use double quotes" on the PR that drops the users table.

The Pushback Moment

Most candidates lose points here, and most don't realize it.

The interviewer plays the code's author. After you flag the race condition in update_user_balance, they say: "We only have one write thread, so race conditions aren't possible here."

This is a probe. Not a correction. They want to see what you do.

Wrong response: "Oh, good point. Never mind then." You folded the moment someone pushed back. Your code review is now worth exactly as much as it takes to express mild disagreement.

Wrong response: "No, it's definitely a bug." Doubling down without engaging their argument is the other failure mode. Now you're just two people disagreeing loudly.

Right response: Engage the constraint and probe it. "If we're guaranteed single-threaded writes now, that fixes the concurrency issue. A few follow-up questions: is that constraint enforced by the system or by convention? If another engineer adds an async endpoint next month, does this break silently? I'd want to add a comment here documenting that assumption so the invariant stays visible."

That response shows you heard the pushback. You understand the context changes the severity. And you're thinking about the code's future life, not just today's PR. The pushback moment is the real interview. How you handle one challenge tells the interviewer far more than the list of issues you found.

Label your comments to make this easier. Some engineers use Must/Could/Should: "Must" for blocking issues, "Could" for meaningful improvements, "Should" for style preferences. It separates "this will cause data loss" from "this could be cleaner" before anyone pushes back.

How to Prepare for a Code Review Interview

Read merged PRs on GitHub. Pick an open-source project in your language. Go to closed pull requests, read the diffs and the review comments. Pay attention to what maintainers flag versus what they let through. You'll internalize what experienced engineers actually care about, and the hierarchy starts to feel obvious.

Review your own old code. Pull something you wrote six months ago and review it like it belongs to a stranger. You'll find things you'd catch immediately now that you missed then. This is useful calibration.

Do a mock with feedback. The hardest part to simulate is the discussion, especially the pushback. Practicing with an AI interviewer that scores your prioritization and communication is the closest thing to the real session. SpaceComplexity runs voice-based mock interviews with rubric feedback across exactly these dimensions.

One reference worth reading before your loop: Google's engineering practices guide documents what their reviewers look for, in priority order. Design first, then functionality, then complexity, then tests. Naming and style are near the bottom. That ordering is not accidental, and arriving with that mental hierarchy is a concrete differentiator.