Algorithms & data structures · lesson 02

Linked lists

An array stores everything in a contiguous block. A linked list scatters nodes across the heap, each one holding the address of the next. Different trade-offs, different tools.

in progress
14 min interactive

Suppose you have a sorted array of a million elements and you need to insert one value in the middle. The array has to shift half a million elements to make room — O(n) work, every time. A linked list inserts in O(1): update two pointers, done. That trade-off is the whole story.

The node structure

Every node holds two things: a value, and the address of the next node. The last node's next is NULL — that's the sentinel that stops traversal.

c
typedef struct Node { int val; struct Node *next; } Node; Node *make_node(int val) { Node *n = malloc(sizeof(Node)); n->val = val; n->next = NULL; return n; }

The self-referential pointer struct Node *next uses the struct tag, not the typedef name. That's not a style choice — it's required. The typedef alias Node isn't complete until the closing brace, so the compiler doesn't know its size yet. struct Node * is always a pointer to an incomplete type, which is legal; a value of incomplete type is not.

💡
Each node is a separate malloc call — a separate heap allocation with its own address. Nodes are almost never adjacent in memory. The "list" is a logical structure built entirely from pointer links, not physical proximity.

Insertion and traversal

Inserting at the front is O(1): make a node, point its next at the old head, return it as the new head. Appending at the back is O(n): you must walk to the end first.

c
// O(1) — just update head Node *prepend(Node *head, int val) { Node *n = make_node(val); n->next = head; return n; } // O(n) — must walk to the last node Node *append(Node *head, int val) { Node *n = make_node(val); if (!head) return n; Node *cur = head; while (cur->next) cur = cur->next; cur->next = n; return head; } // traversal — walk the chain until NULL void print_list(Node *head) { for (Node *cur = head; cur; cur = cur->next) printf("%d ", cur->val); printf("\n"); }

Deletion

Deletion is where linked lists get subtle. Removing node B from A→B→C requires pointing A's next at C, then freeing B. The problem: you need A (the predecessor) to do that. If you only have a pointer to B, you can't unlink it from a singly linked list — you'd have to search from the head.

c
// delete the first node with value val — O(n) Node *delete_val(Node *head, int val) { if (!head) return NULL; // special case: head is the target if (head->val == val) { Node *rest = head->next; free(head); return rest; } // search for the predecessor Node *prev = head; while (prev->next && prev->next->val != val) prev = prev->next; if (prev->next) { // found it Node *target = prev->next; prev->next = target->next; // bypass free(target); } return head; }
⚠️
Never free a node before unlinking it. Once you call free(target), target is a dangling pointer. Update prev->next first, then free.

Freeing the whole list

Freeing only the head leaves the rest of the nodes as leaked memory — they have no other reference and can never be freed. You must walk the chain and free every node. The standard pattern saves next before freeing, because once you free a node you can't dereference it:

c
void free_list(Node *head) { Node *cur = head; while (cur) { Node *nxt = cur->next; // save before free free(cur); cur = nxt; } }
🔥
This is wrong: free(cur); cur = cur->next;
After free(cur), cur is freed memory. Reading cur->next is undefined behavior — it might work, it might crash, it might silently corrupt something else.

Doubly linked lists

A singly linked list can only move forward. Add a prev pointer and you get a doubly linked list — backward traversal and O(1) deletion without needing the predecessor. The price is two pointers of overhead per node and more bookkeeping on every insert and delete.

c
typedef struct DNode { int val; struct DNode *prev; struct DNode *next; } DNode; // remove a node in O(1) — no predecessor search needed void unlink(DNode *n) { if (n->prev) n->prev->next = n->next; if (n->next) n->next->prev = n->prev; free(n); }

The Linux kernel uses doubly linked lists internally for almost everything — runqueues, page lists, inode lists. They use a "intrusive" variant where the prev/next pointers live inside the data struct itself, not in a separate wrapper node.

Arrays vs linked lists — the real trade-offs

operation array linked list
access by index O(1) O(n)
prepend O(n) shift O(1)
insert (at known node) O(n) shift O(1)
delete (at known node) O(n) shift O(1)*
search O(n) O(n)
cache behavior excellent poor
* O(1) for doubly linked; O(n) for singly linked (predecessor search)

The cache row matters more than it looks. A modern CPU loads memory in 64-byte cache lines. When you iterate an array, the prefetcher sees the sequential access pattern and loads the next line before you need it — effectively free. When you follow a linked list's pointers, each node can be anywhere in the heap, so every dereference may cause a cache miss. At hundreds of nanoseconds per miss, the asymptotic advantage disappears fast.

💡
When linked lists win in practice: when you already have a pointer to the node and need O(1) insertion/deletion — like an LRU cache (doubly linked + hash map), or a task queue where you splice items between lists. The O(1) only matters if you already have the pointer; if you have to search first, you pay O(n) either way.

The visualization

The viz shows a singly linked list being built node by node, then traversed. Each node is a separate heap allocation — notice the non-sequential addresses. The traversal cursor (cur) walks through the chain one pointer follow at a time.

one-line takeaway

A linked list trades cache-friendly contiguity for O(1) insertion at a known position — the O(1) only matters if you already have the pointer.