The core idea
An array gives O(1) access by index. A hash table extends this to arbitrary keys
by computing an index from the key: index = hash(key) % capacity.
The array slot at that index holds the value (or a pointer to it).
The hash function must be deterministic (same key → same index always) and distribute keys uniformly across the array. Uneven distribution causes clusters — many keys competing for a few slots — which degrades performance.
Collisions and chaining
Two different keys can hash to the same index. This is a collision. The simplest resolution is chaining: each array slot holds a linked list. Colliding keys are appended to the list. Lookup walks the list until it finds the key.
Load factor and resizing
The load factor is n / capacity — the average number of entries per bucket.
At load factor 1.0 (as many entries as buckets), chains average length 1 and lookup is O(1).
At load factor 5.0, chains average length 5 and lookup is O(5) — still constant, but slower.
When the load factor exceeds a threshold (typically 0.75), double the capacity and rehash all entries. Amortized over many insertions, this keeps O(1) average performance.
A hash table is an array indexed by a key's hash — O(1) average when collisions are rare and the load factor is controlled.