Algorithms & data structures · lesson 11

Hash tables

A hash table turns a key into an array index. Done right, lookup, insert, and delete are all O(1) on average. Done wrong, they collapse to O(n). The difference is the hash function and what happens when two keys map to the same slot.

in progress
14 min interactive

The core idea

An array gives O(1) access by index. A hash table extends this to arbitrary keys by computing an index from the key: index = hash(key) % capacity. The array slot at that index holds the value (or a pointer to it).

The hash function must be deterministic (same key → same index always) and distribute keys uniformly across the array. Uneven distribution causes clusters — many keys competing for a few slots — which degrades performance.

Collisions and chaining

Two different keys can hash to the same index. This is a collision. The simplest resolution is chaining: each array slot holds a linked list. Colliding keys are appended to the list. Lookup walks the list until it finds the key.

c
#define CAPACITY 64 typedef struct Entry { const char *key; int val; struct Entry *next; // chaining } Entry; typedef struct { Entry *buckets[CAPACITY]; } HashMap; static unsigned djb2(const char *s) { unsigned h = 5381; while (*s) h = h * 33 ^ (unsignedchar)*s++; return h; } void hm_put(HashMap *m, const char *key, int val) { unsigned idx = djb2(key) % CAPACITY; Entry *e = malloc(sizeof(Entry)); e->key = key; e->val = val; e->next = m->buckets[idx]; m->buckets[idx] = e; }

Load factor and resizing

The load factor is n / capacity — the average number of entries per bucket. At load factor 1.0 (as many entries as buckets), chains average length 1 and lookup is O(1). At load factor 5.0, chains average length 5 and lookup is O(5) — still constant, but slower.

When the load factor exceeds a threshold (typically 0.75), double the capacity and rehash all entries. Amortized over many insertions, this keeps O(1) average performance.

⚠️
Hash tables are vulnerable to collision attacks. If an adversary can control the keys (e.g., HTTP headers in a web server), they can craft inputs that all hash to the same bucket, turning O(1) into O(n). Use a hash function with a secret seed (SipHash, FNV with randomization) for any hash table fed by external input.
one-line takeaway

A hash table is an array indexed by a key's hash — O(1) average when collisions are rare and the load factor is controlled.