Hash Tables — Algorithms & DS

The core idea

An array gives O(1) access by index. A hash table extends this to arbitrary keys by computing an index from the key: index = hash(key) % capacity. The array slot at that index holds the value (or a pointer to it).

The hash function must be deterministic (same key → same index always) and distribute keys uniformly across the array. Uneven distribution causes clusters — many keys competing for a few slots — which degrades performance.

Collisions and chaining

Two different keys can hash to the same index. This is a collision. The simplest resolution is chaining: each array slot holds a linked list. Colliding keys are appended to the list. Lookup walks the list until it finds the key.

c

#define CAPACITY 64

typedef struct Entry {
    const char  *key;
    int          val;
    struct Entry *next;  // chaining
} Entry;

typedef struct { Entry *buckets[CAPACITY]; } HashMap;

static unsigned djb2(const char *s) {
    unsigned h = 5381;
    while (*s) h = h * 33 ^ (unsignedchar)*s++;
    return h;
}

void hm_put(HashMap *m, const char *key, int val) {
    unsigned idx = djb2(key) % CAPACITY;
    Entry *e = malloc(sizeof(Entry));
    e->key = key; e->val = val;
    e->next = m->buckets[idx];
    m->buckets[idx] = e;
}
          

Load factor and resizing

The load factor is n / capacity — the average number of entries per bucket. At load factor 1.0 (as many entries as buckets), chains average length 1 and lookup is O(1). At load factor 5.0, chains average length 5 and lookup is O(5) — still constant, but slower.

When the load factor exceeds a threshold (typically 0.75), double the capacity and rehash all entries. Amortized over many insertions, this keeps O(1) average performance.

⚠️

Hash tables are vulnerable to collision attacks. If an adversary can control the keys (e.g., HTTP headers in a web server), they can craft inputs that all hash to the same bucket, turning O(1) into O(n). Use a hash function with a secret seed (SipHash, FNV with randomization) for any hash table fed by external input.

one-line takeaway

A hash table is an array indexed by a key's hash — O(1) average when collisions are rare and the load factor is controlled.