Binary Search — Algorithms & DS

A sorted array of a billion elements. You need to find one value. Linear search starts at index 0 and checks each element — up to a billion comparisons. Binary search asks: is the target in the left half or the right half? It throws away half the candidates with a single comparison, then repeats. After 30 comparisons, the search space is down to one element. log₂(1,000,000,000) ≈ 30.

The precondition: sorted order

Binary search only works on a sorted array. The key insight is that if the middle element is less than the target, every element to its left is also less than the target — you can discard the entire left half. That reasoning collapses completely on an unsorted array.

This suggests a common pattern: pay O(n log n) to sort once, then get O(log n) per lookup forever after. Beats O(n) per lookup as soon as you do more than ~log n searches on the same data.

The algorithm

c

// returns index of target, or -1 if not found
int binary_search(int *arr, int n, int target) {
    int lo = 0, hi = n - 1;
    while (lo <= hi) {
        int mid = lo + (hi - lo) / 2;  // NOT (lo + hi) / 2
        if      (arr[mid] == target) return mid;
        else if (arr[mid] <  target) lo = mid + 1;
        else                         hi = mid - 1;
    }
    return -1;
}
          

🚫

The midpoint overflow bug. (lo + hi) / 2 overflows when lo + hi exceeds INT_MAX. The correct formula is lo + (hi - lo) / 2. This bug appeared in Java's standard library and wasn't fixed for nine years.

Off-by-one errors

Binary search has a notorious reputation for off-by-one bugs. The invariant to maintain: lo and hi are both inclusive bounds. Every element in [lo, hi] is still a candidate.

This means the loop condition is lo <= hi (not <), and the updates are lo = mid + 1 and hi = mid - 1 (not mid, which can loop forever when lo == hi).

Tracing the algorithm

Searching for 33 in [2, 5, 8, 12, 16, 23, 33, 45, 67, 91] (indices 0–9):

step	lo	hi	mid	arr[mid]	decision
1	0	9	4	16	16 < 33 → lo = mid+1 = 5
2	5	9	7	45	45 > 33 → hi = mid-1 = 6
3	5	6	5	23	23 < 33 → lo = mid+1 = 6
4	6	6	6	33	found at index 6

10 elements, 4 comparisons. For 1,000 elements it'd be at most 10. For 1,000,000 elements: 20. Each time you double the array size you need exactly one more comparison.

Lower bound — the useful generalization

The "find exact value" form of binary search is the special case. The more generally useful form is lower bound: find the leftmost index where you could insert target while keeping the array sorted. It finds the first element ≥ target, or returns n if all elements are smaller.

c

// first index i where arr[i] >= target, or n if none exists
int lower_bound(int *arr, int n, int target) {
    int lo = 0, hi = n;          // hi = n, not n-1
    while (lo < hi) {             // strict less-than
        int mid = lo + (hi - lo) / 2;
        if (arr[mid] < target) lo = mid + 1;
        else                   hi = mid;  // hi closes in, not mid-1
    }
    return lo;
}

// exact search expressed as lower_bound:
int lo = lower_bound(arr, n, target);
if (lo < n && arr[lo] == target) { /* found */ }
          

💡

Lower bound is how databases implement range queries on sorted indexes. It's also how you'd find the insertion point for a new record without scanning from the start. The C++ standard library provides std::lower_bound and std::upper_bound (first index where arr[i] > target). Together they give you the range [lo, hi) of all occurrences of a value in O(log n).

one-line takeaway

Binary search halves the candidates each step — O(log n) — but only works on sorted data.