Algorithms & data structures · lesson 14

Graph BFS & DFS

Trees are graphs without cycles. Once you add cycles, you need to track what you've visited or risk looping forever. BFS and DFS are the two fundamental strategies for visiting every node exactly once.

in progress
16 min interactive

Graphs vs trees

A tree is a connected graph with no cycles — exactly one path between any two nodes. A general graph drops both constraints. Nodes can have any number of edges. Edges can form cycles. The graph may not even be connected (some nodes may be unreachable from others).

Graphs model the real world: cities connected by roads, web pages connected by links, functions connected by call relationships, tasks connected by dependencies. The two traversal strategies — BFS and DFS — are how you explore any of these.

Representing a graph: adjacency list

The most common representation is an adjacency list: for each node, store a list of its neighbors. In C, this is typically an array of linked lists or an array of arrays.

c
#define MAXN 100 typedef struct Edge { int dest; struct Edge *next; } Edge; Edge *adj[MAXN]; // adj[u] = head of neighbor list for node u void add_edge(int u, int v) { Edge *e = malloc(sizeof(Edge)); e->dest = v; e->next = adj[u]; adj[u] = e; }
💡
Adjacency list vs adjacency matrix. An adjacency matrix uses an n×n boolean grid: matrix[u][v] = 1 if there's an edge. It's O(1) to check if two nodes are connected, but O(n²) space regardless of edge count. Adjacency lists use O(V + E) space — much better for sparse graphs (few edges relative to nodes), which is the common case.

The visited set — why it's mandatory

On a tree, you can recurse without tracking visited nodes — there are no cycles, so you can never revisit a node. On a general graph, a cycle means you'll re-enter a node you've already processed, looping forever (or until stack overflow).

The fix is a visited[] boolean array. Mark a node before processing it, and skip any neighbor that's already marked.

Breadth-first search (BFS)

BFS explores level by level. Start at the source node, visit all its neighbors, then all their unvisited neighbors, and so on. The key data structure is a queue — FIFO order ensures you finish the current level before moving to the next.

BFS finds the shortest path (in number of edges) from the source to any reachable node, because it visits nodes in order of their distance from the source.

c
void bfs(int src, int n) { bool visited[MAXN] = {false}; int q[MAXN]; int head = 0, tail = 0; visited[src] = true; q[tail++] = src; while (head < tail) { int u = q[head++]; visit(u); for (Edge *e = adj[u]; e; e = e->next) { if (!visited[e->dest]) { visited[e->dest] = true; q[tail++] = e->dest; } } } }
⚠️
Mark visited before enqueuing, not after dequeuing. If you mark a node visited only when you dequeue it, the same node can be enqueued multiple times before it's processed. On a dense graph this wastes work; in the worst case it breaks correctness. Mark it as soon as you decide to add it to the queue.

Depth-first search (DFS)

DFS goes as deep as possible down one path before backtracking to explore alternatives. It naturally maps to recursion — the call stack acts as the implicit stack. It can also be written iteratively using an explicit stack.

DFS doesn't guarantee shortest paths, but it's excellent for problems where you need to explore all possibilities: detecting cycles, topological sorting, finding connected components, and solving mazes.

c
bool visited[MAXN]; void dfs(int u) { visited[u] = true; visit(u); for (Edge *e = adj[u]; e; e = e->next) { if (!visited[e->dest]) dfs(e->dest); } } // call dfs(src) after memset(visited, 0, sizeof(visited))

BFS vs DFS — which to use

The choice depends entirely on what you're looking for:

goal use why
shortest path (unweighted) BFS visits by distance from source
cycle detection DFS back edge → cycle
topological sort DFS post-order gives reverse topo order
connected components either run from each unvisited node
maze / path existence DFS explores fully before backtracking

Complexity

Both BFS and DFS are O(V + E) where V is the number of vertices and E is the number of edges. Every vertex is visited once, and every edge is examined once. The difference is memory: BFS can hold up to O(V) nodes in the queue at once (the entire frontier), while DFS's call stack depth is bounded by the longest path, which is at most O(V).

🚫
DFS on deep graphs can overflow the call stack. Recursive DFS uses one stack frame per level of depth. On a graph shaped like a long chain (V = 100,000), this creates 100,000 nested calls and segfaults on most systems. For deep graphs, use the iterative DFS variant with an explicit stack.

Topological sort with DFS

A topological sort of a directed acyclic graph (DAG) orders vertices so that every edge points from an earlier vertex to a later one. This is the order you'd need to build a project: dependencies before the things that depend on them.

The DFS-based approach: run DFS, and when a node is fully processed (all its descendants visited), push it onto a stack. The final stack, read top to bottom, gives the topological order.

c
int topo_stack[MAXN]; int topo_top = 0; bool visited[MAXN]; void dfs_topo(int u) { visited[u] = true; for (Edge *e = adj[u]; e; e = e->next) if (!visited[e->dest]) dfs_topo(e->dest); topo_stack[topo_top++] = u; // push after all descendants }
one-line takeaway

BFS uses a queue and finds shortest paths; DFS uses a stack (or recursion) and explores depth-first — both are O(V + E) and both require a visited set.