Merge sort and quicksort beat insertion sort on large random arrays — but they have overhead: recursive calls, temporary buffers, pivot logic. On tiny arrays (say, 16 elements), that overhead outweighs the algorithmic advantage. Insertion sort has no overhead. It's the reason every production sort implementation eventually hands off to insertion sort for small subproblems. Understanding when O(n²) beats O(n log n) requires understanding what those constants actually are.
How it works
Think of sorting a hand of cards. You pick up one card at a time and slide it left
past larger cards until it finds its place in the already-sorted portion.
That's insertion sort exactly. The left portion of the array is always sorted;
you take the next element, save it as key, and shift larger elements
one position right to open a slot for it.
Tracing through an example
Sorting [5, 2, 4, 6, 1, 3]. The green region is always sorted:
| i | key | array after insertion |
|---|---|---|
| 1 | 2 | 2, 5, 4, 6, 1, 3 |
| 2 | 4 | 2, 4, 5, 6, 1, 3 |
| 3 | 6 | 2, 4, 5, 6, 1, 3 |
| 4 | 1 | 1, 2, 4, 5, 6, 3 (4 shifts) |
| 5 | 3 | 1, 2, 3, 4, 5, 6 (3 shifts) |
Why it's fast on nearly-sorted data
When the array is nearly sorted, most elements are close to their final position.
The inner while loop terminates almost immediately — each element
shifts just a step or two. Total work becomes O(n). This is why timsort
(Python's and Java's default sort) and introsort (C++ standard library) use
insertion sort for subarrays smaller than ~16 elements.
>, not ≥).
It uses O(1) extra space. For small n or nearly-sorted input, it beats every other sort.
Insertion sort is O(n) on nearly-sorted data — the sort every real sorting algorithm falls back to for small inputs.