Quadtree Data Structure: Index Any 2D Space by Dividing It Into Fours

May 27, 202637 min read
dsaalgorithmsinterview-prepdata-structures
Quadtree Data Structure: Index Any 2D Space by Dividing It Into Fours
TL;DR
  • PR quadtree uses fixed midpoint partitioning; point quadtree uses data points as pivots. Use PR.
  • Insert is O(log n) average, O(n) worst; range query is O(√n + k) for uniformly distributed data.
  • Internal nodes hold no data; only leaf nodes store points, up to CAPACITY each.
  • Center + half-dimension parameterization makes containment and intersection tests branch-free.
  • Barnes-Hut uses a quadtree to reduce N-body simulation from O(n²) to O(n log n).
  • LeetCode 427 (Construct Quad Tree) is solved in O(n²) with 2D prefix sums for O(1) uniformity checks.
  • Rebuild each frame for game collision detection; dynamic quadtrees are rarely worth the complexity.

You have a million GPS coordinates. Someone asks: which of these are within 10 km of Times Square? A linear scan works. It also means checking every single point in your dataset, one by one, every time anyone asks. That's fine for a cron job that runs at 3am. It's less fine for a ride-hailing app during New Year's Eve at Times Square itself.

The quadtree data structure solves this by recursively splitting a 2D region into four equal quadrants until each region holds at most a small bucket of points. A range query then walks only the quadrants that overlap your search area, skipping everything else. One spatial partition, logarithmic traversal.

Raphael Finkel and Jon Louis Bentley named this structure in "Quad Trees: A Data Structure for Retrieval on Composite Keys" (Acta Informatica, 1974). The point-region (PR) variant, which fixes the partitioning geometry rather than letting data points drive it, became the dominant form in spatial indexing. The k-d tree is a first cousin. The octree is the 3D generalization. All of them partition space hierarchically to make queries skip irrelevant regions.

Reach for a quadtree when your data lives in 2D, and you need repeated range queries, nearest-neighbor searches, or collision detection between objects scattered across a plane.

A sign that reads: PULL, IF THAT DOESN'T WORK, PUSH, IF THAT DOESN'T WORK, WE'RE PROBABLY CLOSED. Caption: "Brute-forcing a code solution."

Linear scan energy. We're building something better.


How the Quadtree Data Structure Works Internally

A PR quadtree starts with a single rectangular boundary covering your entire coordinate space. Call it the root. It is one giant bucket.

When that bucket overflows (more than CAPACITY points), it subdivides into four equal child regions: northwest, northeast, southwest, southeast. Every point already in the overflowing bucket gets redistributed into whichever child contains it. If a child overflows too, it subdivides again. The recursion stops when every leaf holds at most CAPACITY points.

Three-step quadtree subdivision: one full bucket splits into four quadrants, and then an overflowing SW quadrant splits again

Three steps: one bucket, four quadrants, and recursive subdivision when a child overflows.

Internal nodes hold no data. Only leaf nodes hold points. Once a node subdivides, it becomes an internal node with four children and an empty point list. The points migrate down to the leaves.

The Node in Memory

Each node stores:

  • Bounding box: center (cx, cy) and half-dimensions (hw, hh), 4 floats, 32 bytes
  • Four child pointers: NW, NE, SW, SE (4 × 8 bytes = 32 bytes on a 64-bit machine)
  • Point bucket: a list of up to CAPACITY points (leaf nodes only)
  • A divided flag: 1 byte

Side-by-side comparison of leaf node and internal node memory layouts, showing field sizes in bytes

Left: a leaf node holds points but has null child pointers. Right: an internal node has live child pointers and an empty point list. Never both at once.

Why Center + Half-Dimension?

The (cx, cy, hw, hh) parameterization makes intersection and containment tests symmetric and branch-free. Containment: cx-hw <= px <= cx+hw and cy-hh <= py <= cy+hh. Four comparisons. Subdividing halves hw and hh and shifts cx, cy by half-dimensions. No division, no case analysis. The four child boundaries write themselves:

NW: (cx-hw/2, cy-hh/2, hw/2, hh/2)
NE: (cx+hw/2, cy-hh/2, hw/2, hh/2)
SW: (cx-hw/2, cy+hh/2, hw/2, hh/2)
SE: (cx+hw/2, cy+hh/2, hw/2, hh/2)

The quadtree is a spatial trie. The structure is determined entirely by the geometry of the space, not the order of insertion. Inserting the same points in a different order produces an identical tree.

Point Quadtree vs PR Quadtree

Finkel and Bentley's original structure is the point quadtree: each internal node stores exactly one point, and that point defines the partitioning boundaries. The four quadrants are the four half-planes defined by that point's x and y coordinates. The result is data-dependent partitioning, like a BST for 2D space.

The PR quadtree fixes the partitioning: boundaries are always the midpoints of the parent region, regardless of where points fall. This is easier to implement, gives predictable depth for uniform data, and avoids the nasty deletion problem of the point quadtree (deleting an internal node of a point quadtree requires reinserting its entire subtree). The PR quadtree is what almost everyone means when they say "quadtree." This article implements the PR variant throughout.


Core Operations

OperationAverage CaseWorst CaseAux Space
InsertO(log n)O(n)O(1)
Point searchO(log n)O(n)O(1)
DeleteO(log n)O(n)O(1)
Range queryO(√n + k)O(n)O(log n)
Build (n points)O(n log n)O(n²)O(n)
SpaceN/AO(n)O(n)

k = number of points in the result set.

Why Insert Is O(log n) Average

Each subdivision halves the side length of the region. With n uniformly distributed points over a unit square, after d levels the average number of points per region drops to n / 4^d. Subdivision stops when n / 4^d ≤ CAPACITY, so d ≈ (log₂ n) / 2 = O(log n) levels. Each level does O(1) work: test containment in each of four children. Average insertion depth is O(log n).

A cleaner way to see it: the depth is O(log Δ), where Δ is the spread of the point set, defined as (maximum pairwise distance) / (minimum pairwise distance). For n points on an integer grid of side n, Δ = O(n√2), so depth = O(log n). For uniformly random points, depth is O(log n) with high probability.

Why Worst Case Is O(n)

Pack all n points inside a region of diameter ε, smaller than any subdivision threshold. The tree subdivides in the same corner repeatedly, one point per leaf, reaching depth O(n). No rebalancing occurs. This is the 2D equivalent of inserting sorted values into a BST. The formal bound is that n coincident points produce a tree of depth O(n).

In practice, worst case means pathological data. GPS coordinates of real cities, game object positions, astrophysical simulation particles: all of these behave well. The edge case appears when someone points a quadtree at data that isn't actually 2D spatial, like using (latitude, user_id) as a "point." Don't do that.

Why Range Query Is O(√n + k)

For n uniformly distributed points in a unit square, a rectangular query window produces two costs:

  1. Interior: all leaf cells completely inside the window are visited. There are O(n × A) such cells where A is the query area, and each produces its points directly. That gives O(k) work for collecting the k results.

  2. Boundary: cells that straddle the query window's perimeter. At depth d, cells have side s = 1/2^(d/2) approximately. The query perimeter has length P. It intersects O(P/s) cells at that level. Summing over the O(log n) levels and optimizing: the boundary contribution is O(P × √n). For a square query window, P = O(1), so boundary cells = O(√n).

Total: O(√n + k) expected for uniformly distributed data. The √n term comes from the boundary; k comes from collecting results.

The pruning is the key insight. Entire subtrees get skipped via a single intersects check. The diagram below makes it concrete.

Range query diagram: spatial view showing pruned (gray) and visited (blue) quadrants alongside the corresponding tree traversal with pruned and expanded branches

Gray quadrants never get touched. The query rectangle rules them out with one intersects call each. Only blue and green nodes get recursed into.

Why Delete Is O(log n) Average

In the PR quadtree, deletion traverses to the target leaf (O(log n) depth), removes the point, then optionally merges the parent back into a leaf if all four siblings together hold fewer than CAPACITY points. The merge is O(CAPACITY) per level. Contrast this with the point quadtree, where deleting an internal node forces O(n) reinsertion of the entire subtree. One more reason everyone uses PR.


One Structure, Every Language

These implementations use a bucket PR quadtree: capacity of 4 points per leaf, center + half-dimension boundaries, with insert, query (range search), and subdivide.

from __future__ import annotations from dataclasses import dataclass CAPACITY = 4 @dataclass class Point: x: float y: float @dataclass class Rect: cx: float cy: float hw: float # half-width hh: float # half-height def contains(self, p: Point) -> bool: return (self.cx - self.hw <= p.x <= self.cx + self.hw and self.cy - self.hh <= p.y <= self.cy + self.hh) def intersects(self, other: Rect) -> bool: return not ( other.cx - other.hw > self.cx + self.hw or other.cx + other.hw < self.cx - self.hw or other.cy - other.hh > self.cy + self.hh or other.cy + other.hh < self.cy - self.hh ) class QuadTree: def __init__(self, boundary: Rect, capacity: int = CAPACITY) -> None: self.boundary = boundary self.capacity = capacity self.points: list[Point] = [] self.divided = False self.nw = self.ne = self.sw = self.se = None def insert(self, p: Point) -> bool: if not self.boundary.contains(p): return False if not self.divided and len(self.points) < self.capacity: self.points.append(p) return True if not self.divided: self._subdivide() return (self.nw.insert(p) or self.ne.insert(p) or # type: ignore[union-attr] self.sw.insert(p) or self.se.insert(p)) # type: ignore[union-attr] def _subdivide(self) -> None: cx, cy = self.boundary.cx, self.boundary.cy hw, hh = self.boundary.hw / 2, self.boundary.hh / 2 self.nw = QuadTree(Rect(cx - hw, cy - hh, hw, hh), self.capacity) self.ne = QuadTree(Rect(cx + hw, cy - hh, hw, hh), self.capacity) self.sw = QuadTree(Rect(cx - hw, cy + hh, hw, hh), self.capacity) self.se = QuadTree(Rect(cx + hw, cy + hh, hw, hh), self.capacity) self.divided = True for p in self.points: (self.nw.insert(p) or self.ne.insert(p) or self.sw.insert(p) or self.se.insert(p)) self.points = [] def query(self, region: Rect, found: list[Point] | None = None) -> list[Point]: if found is None: found = [] if not self.boundary.intersects(region): return found if self.divided: self.nw.query(region, found) # type: ignore[union-attr] self.ne.query(region, found) # type: ignore[union-attr] self.sw.query(region, found) # type: ignore[union-attr] self.se.query(region, found) # type: ignore[union-attr] else: for p in self.points: if region.contains(p): found.append(p) return found

What Problems It Actually Solves

Anything where your data has coordinates and your queries ask about proximity. If you're checking O(n) pairs per query and you have O(n) queries, that's O(n²). The quadtree is usually the move.

The specific problem classes:

  • 2D range search. Given a rectangular region, return all points inside it. The quadtree converts this from O(n) to O(√n + k) for well-distributed data.

  • Nearest neighbor search. Traverse the tree, pruning branches whose nearest possible point is farther than the current best candidate. Average O(log n) for uniform distributions.

  • Collision detection. A game loop has N objects. Brute force checks every pair: O(N²) per frame. With a quadtree, each object queries only its local neighborhood: O(N√N) total. For N=10,000 objects, that is 10^8 comparisons reduced to roughly 10^6.

  • N-body simulation (Barnes-Hut). Astrophysicists Barnes and Hut (1986) used a quadtree to reduce N-body gravitational simulation from O(N²) to O(N log N). The key insight: if a cluster of bodies is far enough away (ratio s/d < θ, where s is cluster width and d is distance), approximate the whole cluster as a single body at its center of mass. The quadtree makes this approximation hierarchical and automatic.

  • Image compression (region quadtree). A region quadtree subdivides an image until each leaf covers a region of uniform color. A 1024x1024 image with large uniform regions compresses dramatically because uniform subtrees collapse to a single leaf. This is the basis for several historical image compression formats.

  • Geospatial indexing. Location-based services (finding nearby restaurants, matching drivers to riders, routing) use quadtrees or their derivatives. Google's S2 library, Uber's H3, and many GIS systems build on quad-decomposed coordinate spaces.

  • Mesh generation and LOD. Rendering engines use quadtrees to decide which terrain patches to render at full detail versus lower-resolution approximations, based on viewer distance.


Recognizing the Quadtree Pattern

The clearest signal is 2D spatial data combined with repeated proximity queries. If your constraints include coordinates and operations like "find nearby," "find overlapping," or "find within region," you should be thinking quadtree.

More specifically:

  • The problem mentions (x, y) coordinates or a 2D grid
  • You need to repeatedly find all objects within some distance or bounding box
  • Brute force requires O(n) per query and you have many queries
  • The data is dynamic (insertions and deletions happen alongside queries)
  • You need to detect collisions between objects in a 2D plane

Less obvious signals:

  • A 2D grid problem where you need to represent "uniform" vs "mixed" regions hierarchically (the image quadtree variant)
  • The problem asks you to "construct" a quad tree from a given grid (LeetCode 427)

Worked Example 1: LeetCode 427, Construct Quad Tree

Problem: Given an n×n binary grid of 0s and 1s, construct a quad tree representing it. A node is a leaf if its entire subgrid is uniform. Otherwise, split into four quadrants.

Why quadtree fits: The problem literally defines the region quadtree structure. Each node is either uniform (leaf) or mixed (internal node with four children).

Naive approach: For each subgrid, scan all cells to check uniformity. O(n²) per level, O(log n) levels: O(n² log n) total.

Better approach: Precompute a 2D prefix sum. Then isUniform(r1, c1, r2, c2) runs in O(1), and each subgrid check costs O(1). Total: O(n²) build time, O(n²) space.

def construct(grid: list[list[int]]) -> 'Node': n = len(grid) # prefix sums for O(1) region sum queries ps = [[0] * (n + 1) for _ in range(n + 1)] for r in range(n): for c in range(n): ps[r+1][c+1] = grid[r][c] + ps[r][c+1] + ps[r+1][c] - ps[r][c] def region_sum(r1, c1, r2, c2): return ps[r2+1][c2+1] - ps[r1][c2+1] - ps[r2+1][c1] + ps[r1][c1] def build(r1, c1, r2, c2): total = region_sum(r1, c1, r2, c2) area = (r2 - r1 + 1) * (c2 - c1 + 1) if total == 0 or total == area: return Node(total > 0, True) # leaf mid_r, mid_c = (r1 + r2) // 2, (c1 + c2) // 2 node = Node(True, False) # internal node node.topLeft = build(r1, c1, mid_r, mid_c) node.topRight = build(r1, mid_c+1, mid_r, c2) node.bottomLeft = build(mid_r+1, c1, r2, mid_c) node.bottomRight = build(mid_r+1, mid_c+1, r2, c2) return node return build(0, 0, n-1, n-1)

The signal is clear once you notice: "recursively divide the grid into four quadrants until each region is uniform." That description is the region quadtree.

Worked Example 2: Collision Detection in a Game Loop

Problem: You're building a 2D game. Each frame, N objects move. You need to find all pairs of objects whose bounding boxes overlap. Brute force is O(N²) per frame, which dies around N=500.

Why quadtree fits: Objects are in 2D space, the query is proximity-based, and the data is dynamic (objects move between frames).

Approach:

  1. Rebuild the quadtree each frame by inserting all N objects. O(N log N) average.
  2. For each object, query the quadtree with a slightly enlarged bounding box. O(√N + k) per query, where k is the number of nearby objects.
  3. Check actual overlap only against the candidates.
Frame update loop:
┌─────────────────────────────────────┐
│  qt = new QuadTree(worldBounds)     │  O(1)
│  for each object o:                 │
│      qt.insert(o.bounds.center)     │  O(log N) avg each
│  for each object o:                 │
│      candidates = qt.query(         │  O(√N + k) each
│          expanded(o.bounds))        │
│      for c in candidates:           │
│          check_exact_overlap(o, c)  │  O(k) each
└─────────────────────────────────────┘
Total: O(N log N) build + O(N√N) queries
vs brute force O(N²)

For N=1000: brute force is 1,000,000 pair checks; quadtree is roughly 32,000 tree operations plus actual pair checks only for nearby objects.

When to reach for a k-d tree instead: k-d trees have the same asymptotic complexity for range queries but tend to be more cache-friendly and better for static datasets. Quadtrees rebuild faster for highly dynamic data because the fixed-geometry partitioning means you don't need to recompute split points. Both are fine for interviews; the interviewer wants to see that you know spatial indexing exists and understand the range query speedup.


The Ways You'll Get This Wrong

Choosing CAPACITY. CAPACITY 1 gives maximum depth and wastes memory on nearly-empty internal nodes. CAPACITY 100 means you rarely subdivide and queries degrade toward linear. In practice, CAPACITY between 4 and 16 works well for most 2D point datasets. The sweet spot minimizes (depth × bucket scan cost). Pick 4. Move it later if profiling says to.

Rebuilding vs updating. A fully dynamic quadtree that supports efficient deletion and rebalancing is complex. For game collision detection, the standard industry practice is to simply rebuild the quadtree from scratch each frame. It sounds wasteful. For N=10,000 objects and 60 frames per second, rebuilding takes roughly 1.5ms on modern hardware. Acceptable. Shipping is acceptable.

The clustering pathology. If all your points cluster in one corner of the coordinate space, the tree degenerates toward O(n) depth in that region. The fix: choose your root boundary tightly around the actual data extent, not some theoretical maximum. If your GPS coordinates are all in New York, don't root the quadtree in a box covering the whole Earth. You will do this at least once.

Floating point boundary comparisons. Points exactly on the boundary between two quadrants can behave differently depending on whether your contains check uses < or <=. Pick one convention and apply it consistently: cx-hw <= px <= cx+hw. A point on the exact midpoint of the NW/NE boundary always goes NE with this convention. What actually matters is that every point lands in exactly one child, and that you don't change the convention when a mysterious bug shows up. Pick it and commit.


Quick Recap

  • A quadtree recursively subdivides a 2D region into four equal quadrants until each leaf holds at most CAPACITY points.
  • The PR (point-region) variant uses fixed midpoint partitioning; the point quadtree uses data points as partitioning pivots. Use PR.
  • Insert and point search are O(log n) average, O(n) worst. Worst case requires pathologically clustered data.
  • Range queries cost O(√n + k) expected for uniformly distributed data, where k is the result count.
  • The key insight behind the efficiency: the intersects check lets you prune entire subtrees without visiting their contents.
  • Delete in PR quadtrees is O(log n). Delete in point quadtrees is O(n) due to subtree reinsertion.
  • Real-world applications: game collision detection, geospatial indexing, N-body simulation (Barnes-Hut), image compression, LOD rendering.
  • Pattern signals: 2D coordinates, repeated "find nearby" queries, explicit "construct quad tree" problem statement.
  • Use a k-d tree for static datasets with frequent nearest-neighbor queries; use a quadtree for dynamic 2D data or when rebuilding each cycle is acceptable.

Explaining quadtrees in an interview is its own separate challenge from implementing them. SpaceComplexity runs voice-based DSA mock interviews with rubric-based feedback, so you can practice talking through spatial data structures out loud before the real interview clock starts.

For more on related structures, see binary search trees for the 1D analogue and the k-d tree deep dive for the generalization to arbitrary dimensions.


Further Reading