Graph vs Tree: Why One DFS Needs a Visited Set and One Doesn't

You write the same DFS on a binary tree every week. Recurse left, recurse right, combine results. No visited set. No cycle check. It just works.

Then you hit a graph problem and copy the same template. One missing visited set later, your code is looping forever. Your fan spins up. You add visited and move on without thinking about why you needed it.

The why matters. The graph vs tree confusion starts with shared vocabulary: nodes, edges, traversal. But the structural rules are completely different, and those differences dictate how you code and how you argue correctness.

A tree is a graph with two extra constraints. Remove those constraints and you get a general graph.

A Tree Is a Constrained Graph

In graph theory, a tree is an undirected graph that is both connected (there is a path between every pair of nodes) and acyclic (there are no cycles). That is the entire definition. Two constraints.

Everything you know about trees follows from those two. Five equivalent ways to describe a tree with N vertices:

Connected and acyclic
Connected and has exactly N-1 edges
Acyclic and has exactly N-1 edges
Connected, and removing any single edge disconnects it
There is exactly one path between every pair of vertices

Any one of these definitions implies all the others. Pick whichever one matches how your problem describes itself.

The N-1 edges fact is worth sitting with. Think inductively. One node has zero edges. Connect a second: one edge, one new vertex. Connect a third: one more edge, one more vertex. Each new node contributes exactly one edge. N nodes requires exactly N-1 edges. Add one extra edge and you have created a cycle. Remove one edge and the graph splits into two components.

The moment a problem says "N nodes, N-1 edges" you know you have a tree and can skip the visited set.

What Happens When You Remove the Rules

A general graph relaxes both tree constraints. It can have cycles, be disconnected, directed, or weighted.

This produces a spectrum from most constrained to least:

Tree: connected, acyclic, undirected
Forest: acyclic, undirected, possibly disconnected (a collection of disjoint trees)
DAG: directed acyclic graph, no directed cycles but direction matters
Undirected graph: cycles allowed, disconnection allowed
Directed graph: all of the above

The constraint spectrum from Tree to Directed Graph, showing what each step relaxes Each arrow drops or adds one constraint. A tree sits at the most restricted end; a directed graph with cycles at the least.

Most graph problems on LeetCode sit somewhere on this spectrum. Course schedule problems give you a DAG, or ask you to detect that you do not have one. Social network problems use undirected graphs. File systems are trees. The label "graph" does not always tell you where on the spectrum it sits, so you have to read the constraints.

How You Store Them in Code

Trees

Trees are almost always stored as pointer nodes.

class TreeNode:
    def __init__(self, val=0, left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

Trees always use O(N) total space: N nodes, N-1 edges, and the edges are implicit in the child pointers.

Graphs

Graphs need explicit edge storage. Two main options:

Adjacency list: graph[u] holds the list of u's neighbors. Space is O(V + E). For a sparse graph with millions of vertices but few connections per node, this is far more efficient than any matrix.

Adjacency matrix: A V×V boolean grid where matrix[u][v] = 1 if an edge exists. Space is O(V²), regardless of how many edges actually exist.

Same 5-node graph shown as adjacency list (left) and adjacency matrix (right) The adjacency list holds 10 entries for 5 edges. The matrix holds 25 cells and wastes 15 of them on zeros. Sparse graphs make this worse fast.

Both DFS and BFS run in O(V + E) on an adjacency list and O(V²) on an adjacency matrix. Visiting every neighbor of every node costs proportional to total stored edges. For sparse graphs, which covers most interview problems, use the adjacency list.

The matrix's one real advantage is O(1) edge existence queries. Algorithms like Floyd-Warshall that constantly ask "is there a direct edge from u to v?" benefit from this. Everything else favors the list.

For more on the representation tradeoffs and when to reach for each, see Graph Data Structure Interview: Adjacency Lists, BFS, DFS, and Thinking in Graphs.

Graph vs Tree DFS: The Code Actually Diverges Here

This is where the structural difference hits your keyboard.

Tree DFS requires no visited set. Since trees are acyclic and every path between two nodes is unique, you cannot revisit a node during DFS. The structure prevents it. You just recurse.

def height(root):
    if not root:
        return 0
    return 1 + max(height(root.left), height(root.right))

Graph DFS requires a visited set. Without one, a cycle sends you into an infinite loop. With one, you guarantee each node is processed exactly once.

def dfs(graph, node, visited):
    visited.add(node)
    for neighbor in graph[node]:
        if neighbor not in visited:
            dfs(graph, neighbor, visited)

# Outer loop handles disconnected components
visited = set()
for v in range(n):
    if v not in visited:
        dfs(graph, v, visited)

Notice the outer loop. Trees do not need it because trees are connected by definition. Graphs can be disconnected, so you must attempt DFS from every unvisited node to cover all components. Skip the outer loop and you silently miss entire parts of the graph.

Tree DFS (left, clean) vs Graph DFS (right, with visited set and cycle edge blocked) Left: tree DFS recurses freely. Right: graph DFS tracks gray nodes on the call stack. The dashed red arc is a back edge, following it would mean infinite recursion.

The same template applies to BFS. The BFS vs DFS post covers how to choose between the two once you know you are on a graph.

Complexity at a Glance

Operation	General Tree	Balanced BST	Graph (adj list)	Graph (adj matrix)
Space	O(N)	O(N)	O(V + E)	O(V²)
DFS / BFS	O(N)	O(N)	O(V + E)	O(V²)
Search	O(N)	O(log N)	O(V + E)	O(V + E)
Insert node	O(1)	O(log N)	O(1)	O(1)
Check edge (u,v)	N/A	N/A	O(degree(u))	O(1)

The adjacency matrix's one real advantage is O(1) edge existence queries. Algorithms like Floyd-Warshall benefit from this. Everything else favors the adjacency list for sparse inputs.

The Problem Shape Tells You Which Structure You Have

Reach for a tree when:

The data is hierarchical. File systems, the DOM, org charts, JSON documents.
You need ordered insertions and deletions: BST.
You need priority extraction: heap.
You need prefix lookups: trie.
The problem says "N nodes, N-1 edges" or explicitly states no cycles.
You need to aggregate over subtrees (segment trees, Fenwick trees).

Reach for a graph when:

Connections are arbitrary. Social networks, road maps, package dependencies.
The problem mentions paths, connectivity, or reachability without implying a root.
Cycles are possible and matter (course scheduling, deadlock detection).
The input is a 2D grid. Each cell is a vertex connected to its neighbors.
The problem says "shortest path," "minimum spanning tree," or "maximum flow."

A 2D grid is a graph. Each cell (r, c) is a vertex with up to four neighbors, computed from coordinates rather than an explicit list. The visited set is still mandatory. Number of Islands is a graph problem with a grid costume on.

4x4 grid showing cell (1,2) and its four neighbor arrows Each cell is a vertex. The four arrows are graph edges computed on the fly from coordinates. You are doing graph DFS whether you think of it that way or not.

Two Problems, One Principle

Max Depth of a Binary Tree (Tree)

Input: A binary tree. Task: Return its maximum depth.

Because the input is a tree, it is acyclic and connected. The DFS is clean: the depth at any node is 1 plus the deeper of its two subtrees. No visited set. No cycle guard. None needed.

def maxDepth(root):
    if not root:
        return 0
    return 1 + max(maxDepth(root.left), maxDepth(root.right))

Time: O(N). Space: O(H) for the call stack, where H is the tree height.

The correctness argument is short: the tree structure guarantees no node appears twice. You do not have to defend a visited set because the structure defends itself.

Course Schedule (Graph)

Input: N courses and a list of prerequisite pairs [a, b] meaning "you must take b before a." Task: Return true if you can finish all courses.

This is cycle detection in a directed graph. If prerequisites form a cycle, no valid ordering exists. A simple boolean visited set fails here, because you need to distinguish between a node you finished in a previous DFS path (safe) and a node currently on the call stack (a cycle if you see it again).

Use the 3-color DFS from CLRS. White = unvisited. Gray = on the current DFS call stack. Black = fully processed.

def canFinish(numCourses, prerequisites):
    graph = [[] for _ in range(numCourses)]
    for a, b in prerequisites:
        graph[b].append(a)

    WHITE, GRAY, BLACK = 0, 1, 2
    color = [WHITE] * numCourses

    def dfs(node):
        color[node] = GRAY
        for neighbor in graph[node]:
            if color[neighbor] == GRAY:
                return False          # back edge = cycle
            if color[neighbor] == WHITE:
                if not dfs(neighbor):
                    return False
        color[node] = BLACK
        return True

    return all(dfs(i) for i in range(numCourses) if color[i] == WHITE)

Time: O(V + E). Space: O(V) for the color array and recursion stack.

The outer loop is mandatory here because the prerequisite graph can have disconnected components. A course with no prerequisites and no dependents is an isolated vertex. A tree would never have this problem.

Two directed graphs: left shows a valid DAG with all nodes turned BLACK, right shows a cycle with three GRAY nodes and a red back edge Left: DFS completes, all nodes BLACK, no cycle. Right: DFS hits a GRAY node from another GRAY node, the back edge C→A signals a cycle and we return False immediately.

For the full ordering (not just cycle detection), see Topological Sort, which extends this exact DFS into a full ordering algorithm.

The Recap

Every tree is a graph. The reverse is not true.
Two constraints separate them: trees are connected (one component) and acyclic (no cycles). Both together force exactly N-1 edges and a unique path between every pair of nodes.
Tree DFS is simpler. No visited set, because the structure prevents revisiting. No outer loop, because the structure guarantees connectivity.
Graph DFS needs both. The visited set handles cycles. The outer loop handles disconnected components.
Storage differs. Trees use pointer nodes (O(N)). Graphs use adjacency lists (O(V+E)) or matrices (O(V²)).
Grid problems are graph problems. Coordinates give you the edges. The visited set is still non-negotiable.
When the problem says N-1 edges, no cycles, or "binary tree," you have a tree. When it says graph, grid, or arbitrary paths, you have a graph.

If you want to practice explaining why your DFS works, not just that it works, SpaceComplexity runs voice-based mock interviews where the interviewer probes the reasoning. You will get asked the visited set question. Have the answer ready before it comes up.