Algorithms & data structures · lesson 05

Binary search

Linear search checks every element. Binary search cuts the remaining candidates in half each step. On a million-element array: 1,000,000 comparisons vs 20.

in progress
10 min interactive

A sorted array of a billion elements. You need to find one value. Linear search starts at index 0 and checks each element — up to a billion comparisons. Binary search asks: is the target in the left half or the right half? It throws away half the candidates with a single comparison, then repeats. After 30 comparisons, the search space is down to one element. log₂(1,000,000,000) ≈ 30.

The precondition: sorted order

Binary search only works on a sorted array. The key insight is that if the middle element is less than the target, every element to its left is also less than the target — you can discard the entire left half. That reasoning collapses completely on an unsorted array.

This suggests a common pattern: pay O(n log n) to sort once, then get O(log n) per lookup forever after. Beats O(n) per lookup as soon as you do more than ~log n searches on the same data.

The algorithm

c
// returns index of target, or -1 if not found int binary_search(int *arr, int n, int target) { int lo = 0, hi = n - 1; while (lo <= hi) { int mid = lo + (hi - lo) / 2; // NOT (lo + hi) / 2 if (arr[mid] == target) return mid; else if (arr[mid] < target) lo = mid + 1; else hi = mid - 1; } return -1; }
🚫
The midpoint overflow bug. (lo + hi) / 2 overflows when lo + hi exceeds INT_MAX. The correct formula is lo + (hi - lo) / 2. This bug appeared in Java's standard library and wasn't fixed for nine years.

Off-by-one errors

Binary search has a notorious reputation for off-by-one bugs. The invariant to maintain: lo and hi are both inclusive bounds. Every element in [lo, hi] is still a candidate.

This means the loop condition is lo <= hi (not <), and the updates are lo = mid + 1 and hi = mid - 1 (not mid, which can loop forever when lo == hi).

Tracing the algorithm

Searching for 33 in [2, 5, 8, 12, 16, 23, 33, 45, 67, 91] (indices 0–9):

step lo hi mid arr[mid] decision
1 0 9 4 16 16 < 33 → lo = mid+1 = 5
2 5 9 7 45 45 > 33 → hi = mid-1 = 6
3 5 6 5 23 23 < 33 → lo = mid+1 = 6
4 6 6 6 33 found at index 6

10 elements, 4 comparisons. For 1,000 elements it'd be at most 10. For 1,000,000 elements: 20. Each time you double the array size you need exactly one more comparison.

Lower bound — the useful generalization

The "find exact value" form of binary search is the special case. The more generally useful form is lower bound: find the leftmost index where you could insert target while keeping the array sorted. It finds the first element ≥ target, or returns n if all elements are smaller.

c
// first index i where arr[i] >= target, or n if none exists int lower_bound(int *arr, int n, int target) { int lo = 0, hi = n; // hi = n, not n-1 while (lo < hi) { // strict less-than int mid = lo + (hi - lo) / 2; if (arr[mid] < target) lo = mid + 1; else hi = mid; // hi closes in, not mid-1 } return lo; } // exact search expressed as lower_bound: int lo = lower_bound(arr, n, target); if (lo < n && arr[lo] == target) { /* found */ }
💡
Lower bound is how databases implement range queries on sorted indexes. It's also how you'd find the insertion point for a new record without scanning from the start. The C++ standard library provides std::lower_bound and std::upper_bound (first index where arr[i] > target). Together they give you the range [lo, hi) of all occurrences of a value in O(log n).
one-line takeaway

Binary search halves the candidates each step — O(log n) — but only works on sorted data.