Binary Search Trees — Algorithms & DS

The BST invariant

A binary search tree is a binary tree that satisfies the ordering property at every node: all keys in the left subtree are strictly less than the node's key, and all keys in the right subtree are strictly greater. This must hold not just for immediate children — it must hold for the entire subtree.

The payoff: to find a value, you never need to look at both sides. At each node, compare your target to the current value. Too small? Go left. Too large? Go right. The comparison halves the remaining candidates each step — the same logic as binary search, but the structure is dynamic: you can insert and delete without rebuilding the entire thing.

Search

Search follows the invariant mechanically. Start at the root. If the current node is NULL, the value isn't in the tree. If it matches, done. Otherwise recurse into the appropriate child.

c

Node *bst_search(Node *n, int key) {
    if (!n || n->val == key) return n;
    if (key < n->val) return bst_search(n->left,  key);
    else              return bst_search(n->right, key);
}
          

Insert

Insertion finds the correct leaf position and attaches a new node there. The recursive pattern returns the (possibly new) subtree root, which lets parent pointers stay correct without passing a pointer-to-pointer.

c

Node *bst_insert(Node *n, int key) {
    if (!n) return make_node(key);
    if      (key < n->val) n->left  = bst_insert(n->left,  key);
    else if (key > n->val) n->right = bst_insert(n->right, key);
    return n;  // duplicate keys are ignored
}
          

💡

In-order traversal of a BST yields sorted output. Because the left subtree always contains smaller values and the right subtree always contains larger ones, visiting left → node → right visits every key in ascending order. This is why BSTs are useful: you get sorted iteration for free.

Delete

Deletion is the hardest operation. Three cases:

Leaf node — just free it, return NULL to parent.
One child — replace the node with its only child.
Two children — find the in-order successor (smallest node in right subtree), copy its value into the current node, then delete the successor from the right subtree.

c

Node *min_node(Node *n) {
    while (n->left) n = n->left;
    return n;
}

Node *bst_delete(Node *n, int key) {
    if (!n) return NULL;
    if      (key < n->val) { n->left  = bst_delete(n->left,  key); }
    else if (key > n->val) { n->right = bst_delete(n->right, key); }
    else {
        // found: handle the three cases
        if (!n->left)  { Node *tmp = n->right; free(n); return tmp; }
        if (!n->right) { Node *tmp = n->left;  free(n); return tmp; }
        // two children: replace with in-order successor
        Node *succ = min_node(n->right);
        n->val   = succ->val;
        n->right = bst_delete(n->right, succ->val);
    }
    return n;
}
          

The degenerate case

If you insert sorted data into a plain BST — say, 1, 2, 3, 4, 5 — every new node goes to the right. The tree becomes a right-leaning chain, height n, and every operation costs O(n). You've built an expensive linked list.

⚠️

Insertion order determines tree shape. A plain BST gives you no guarantee about height. Insert 50 random keys and the expected height is O(log n). Insert them in sorted order and you get O(n). Real applications use self-balancing variants.

Self-balancing trees

A self-balancing BST automatically restructures itself on insert and delete to keep height O(log n) regardless of input order. The two most common variants are:

AVL tree — after every insert or delete, checks the height difference between left and right subtrees at each affected node. If the difference exceeds 1, it performs a rotation — a local pointer rearrangement that reduces height while preserving the BST invariant. AVL trees are strictly balanced, so lookups are fast.
Red-black tree — each node is colored red or black, and a set of color rules enforces that no path from root to leaf is more than twice as long as any other. Red-black trees are less strictly balanced than AVL trees but require fewer rotations on insert and delete. They're what std::map in C++ and Java's TreeMap use under the hood.

💡

Rotations preserve the BST invariant. A right rotation on a node x with left child y makes y the new subtree root and x the right child of y. Every key ordering relationship is preserved — only the shape changes. This is the mechanism that keeps self-balancing trees fast.

BST vs hash table

Hash tables give O(1) average lookup. BSTs give O(log n) worst-case. Why would you ever use a BST?

Ordered iteration — in-order traversal yields sorted keys. Hash tables have no ordering.
Range queries — find all keys between 10 and 50. A BST can do this efficiently; a hash table cannot.
Predecessor / successor — find the largest key smaller than x. O(log n) in a BST, O(n) in a hash table.
No hash function required — any comparable type works directly.

one-line takeaway

A BST is O(log n) only while balanced — insert sorted data and it degenerates to O(n). Use a self-balancing variant in practice.