The question Big-O answers
You wrote two sorting programs. Both produce the correct output. On 100 elements, both finish in under a millisecond. On 100,000 elements, one takes 200ms and the other takes 3 minutes. Why?
The answer isn't the hardware. It isn't the language. It's the shape of growth. One algorithm does work proportional to n·log(n) — if the input doubles, the work barely more than doubles. The other does work proportional to n² — if the input doubles, the work quadruples. At 100 elements the difference is invisible. At 100,000 elements it's the difference between useful and unusable.
Big-O notation is the language we use to describe that shape precisely, independent of machine speed, language, or constant-factor optimizations.
The formal meaning
When we write f(n) = O(g(n)), we mean: there exist positive constants c and n₀ such that for all n > n₀, f(n) ≤ c·g(n). In plain terms: beyond some threshold input size, f(n) never exceeds a fixed multiple of g(n).
This is an upper bound. It says "grows no faster than." There are two companion notations:
- O(g(n)) — upper bound. f grows no faster than g. "At most this bad."
- Ω(g(n)) — lower bound. f grows no slower than g. "At least this much work."
- Θ(g(n)) — tight bound. f is both O(g) and Ω(g). "Exactly this growth rate."
In practice, when engineers say "this algorithm is O(n log n)" they usually mean Θ(n log n) — tight, not just an upper bound. Context makes it clear.
Why constants are dropped
An algorithm doing 3n operations and one doing 100n operations are both O(n). This seems wrong at first — 100n is 33× more work than 3n. Why ignore that?
Because the constant is fixed, and n grows. At n = 1,000:
The constant-factor advantage of B over A (if B had a smaller constant) would never overcome the quadratic growth. Hardware and compiler optimizations can shrink constants — they cannot change the exponent. Pick the right complexity class first.
Lower-order terms are dropped for the same reason. 5n² + 3n + 100 is O(n²) because at large n, the 3n and 100 become negligible compared to 5n². At n = 10,000: 5n² = 500,000,000, 3n = 30,000, 100 = 100. The 3n is 0.006% of the total. It doesn't affect the shape.
The common complexity classes
Deriving complexity from code, step by step
The complexity usually follows from the loop structure. Here's the pattern for each case, with the reasoning spelled out.
Tricky cases that trip people up
The triangular loop
When the inner loop starts at i+1 instead of 0,
the body doesn't run n² times — it runs fewer. But the complexity is still O(n²).
This pattern — comparing every pair exactly once — shows up in bubble sort. It looks "more efficient" than the full n² nested loop, and it does half the work, but it's in the same complexity class.
Logarithmic loops
A loop that divides its counter rather than subtracting runs in O(log n) time, even if it doesn't look like a search.
Nested loops where one bound is fixed
Best, average, and worst case
The same algorithm can have different complexities depending on the input. These are three distinct analyses — don't conflate them.
- Worst case — the input that forces the maximum work. Linear search through n elements: the target is at the end (or missing) → O(n).
- Best case — the input that minimizes work. Linear search: target is at index 0 → O(1). This is rarely useful information.
- Average case — expected work over a random input distribution. Linear search with target equally likely at any position → O(n/2) = O(n).
Quicksort illustrates why this matters: its average case is O(n log n) — roughly as fast as possible for a comparison sort. Its worst case is O(n²) — as slow as bubble sort. When people say "quicksort is fast" they mean average case. When they say "use merge sort for guaranteed performance" they mean worst case.
Recursive complexity: the recursion tree
Recursion makes complexity harder to see because the work is spread across call frames. The most reliable approach: draw the recursion tree, then count total work across all levels.
Merge sort on an array of n elements:
The key insight: at every level of the recursion tree, we do O(n) work in total. The number of levels is log n (because we halve each time). So the total is O(n log n).
Contrast with naive recursive Fibonacci:
Space complexity
Big-O applies to memory too. Count the extra memory used beyond the input itself.
- O(1) space — a fixed number of variables regardless of n. Bubble sort: two pointers, one swap variable. Input array is not "extra."
- O(n) space — auxiliary storage proportional to input. Merge sort's merge step: needs a temporary array of size n to merge into.
- O(log n) space — recursion stack depth. Quicksort (average case): log n recursive calls on the stack simultaneously. Each frame is O(1), so total stack space is O(log n).
- O(n) space (recursion) — recursion stack depth times frame size. Merge sort: log n levels deep, each O(n) for the merge buffer → O(n) total.
Amortized complexity
Sometimes a single operation is expensive, but it's rare enough that the average cost per operation stays low. This is amortized analysis — spreading the cost of expensive operations across many cheap ones.
The canonical example is a dynamic array (C++'s vector, Python's list).
Appending to it is usually O(1) — write one element. But when the array fills,
it reallocates to 2× the capacity, copying all existing elements. That realloc is O(n).
Expensive? Yes. But how often does it happen? Starting at capacity 1 and doubling: reallocs happen at sizes 1, 2, 4, 8, ..., n. Total copying: 1 + 2 + 4 + ... + n = 2n. Spread that 2n across n append operations → 2 copies per append on average. The amortized cost is O(1) per append.
The real numbers
When complexity class doesn't tell the whole story
Big-O tells you the shape. Constants and cache behavior fill in the rest.
Insertion sort is O(n²). For arrays smaller than ~20–50 elements, it's faster than O(n log n) merge sort. Why? Merge sort has overhead: recursive calls, auxiliary array allocation, indirect memory access. Insertion sort does simple adjacent swaps in a tight loop — hardware prefetchers love it, and branch predictors handle it well.
This is why qsort in most standard libraries isn't pure quicksort —
it switches to insertion sort for small partitions. The complexity class is O(n log n)
in practice because the O(n²) parts only handle tiny n.
1. Pick the right complexity class first — this is the most important decision.
2. Then consider constants and cache behavior for the actual input sizes you'll see.
3. Profile before micro-optimizing. Your assumptions about which constant is larger are often wrong.
Growth curves
The chart below plots all five main classes on the same axes (n = 0–20, y capped for readability). O(n log n) and O(n²) exit the top — that's the point. At scale, they dwarf everything below them.
Big-O measures how work scales with input size — drop the constants, keep the shape, and always ask which case the analysis covers.