06 · Hash Tables

Chaining · linear probing · cuckoo hashing — three strategies, same key stream, very different load-factor behaviour.

What you're watching

All three tables use the same N = 16 buckets and receive the same randomly-generated stream of keys. They differ in what to do when two keys hash to the same bucket. Stats above each panel update live: load factor, item count, and the strategy-specific worst-case metric (longest chain · longest probe · last cuckoo eviction count).

The default hash function is Knuth's multiplicative hash with the golden-ratio constant, which gives near-uniform distribution. Toggle "use bad hash (k % size)" to see the same load factor cause much worse collision concentration when the hash function has poor mixing — primarily a problem when keys are sequential.

The three strategies

Chaining   linked list per bucket · graceful degradation

Each bucket holds a list of every key that hashed to it. Lookup is "find bucket, then linear search within." Average case stays O(1) as long as the load factor stays modest; worst case is O(n) when every key collides. Doesn't degrade catastrophically — load factor > 1 just makes chains longer, and the table never "fills up."

Linear probing   single slot per bucket · cluster-prone

Each bucket holds at most one key. On collision, walk forward (wrapping at the end of the array) until you find an empty slot. Lookup follows the same walk. Cache-friendly because the walk is sequential memory access — but degrades sharply past ~70% load factor: clusters of occupied slots merge into bigger clusters, dramatically increasing the average probe distance (this is "primary clustering").

Cuckoo hashing   two tables, two hashes · O(1) worst-case lookup

Use two independent hash functions and two parallel tables. On insert: try table 1 at h₁(k); if occupied, evict the previous tenant, push the new key in, then re-place the evicted tenant in table 2 at h₂(k'). Repeat if that one is also occupied. Lookup is O(1) worst-case — every key sits at exactly one of two known slots — but insertion can chain through many evictions, and if the chain loops forever the table needs a rehash. Cuckoo's effective capacity caps near 50% load on its primary table.

TRY THIS
  • Hit "Apply" with target 70% — chaining and cuckoo handle it cleanly; probing's longest-probe metric jumps.
  • Push the slider to 90% and apply — probing's longest probe explodes; cuckoo will hit "INSERT FAILED" and stop.
  • Toggle "use bad hash" then add 16 keys — chaining piles up in one bucket, probing forms one giant cluster, cuckoo cycles forever and fails fast.

What's a "good" hash function

A good hash function spreads keys nearly uniformly across buckets. The golden-ratio multiplicative hash used here — h(k) = (k × 2654435761) mod size — has the property that consecutive keys land in widely-separated buckets, because 2654435761 ≈ 2³² × (φ − 1) is irrational-mod-anything-power-of-2. The "bad hash" toggle replaces this with k mod size, which is strict residue: consecutive keys produce consecutive bucket indices, immediately destroying the uniform-distribution property that all three strategies depend on.

What I learned writing this

← Back to all algorithms