06 · Hash Tables

Chaining · linear probing · cuckoo hashing — three strategies, same key stream, very different load-factor behaviour.

Fill to load factor 70% use bad hash (k % size)

What you're watching

All three tables use the same N = 16 buckets and receive the same randomly-generated stream of keys. They differ in what to do when two keys hash to the same bucket. Stats above each panel update live: load factor, item count, and the strategy-specific worst-case metric (longest chain · longest probe · last cuckoo eviction count).

The default hash function is Knuth's multiplicative hash with the golden-ratio constant, which gives near-uniform distribution. Toggle "use bad hash (k % size)" to see the same load factor cause much worse collision concentration when the hash function has poor mixing — primarily a problem when keys are sequential.

The three strategies

Chaining linked list per bucket · graceful degradation

Each bucket holds a list of every key that hashed to it. Lookup is "find bucket, then linear search within." Average case stays O(1) as long as the load factor stays modest; worst case is O(n) when every key collides. Doesn't degrade catastrophically — load factor > 1 just makes chains longer, and the table never "fills up."

Linear probing single slot per bucket · cluster-prone

Each bucket holds at most one key. On collision, walk forward (wrapping at the end of the array) until you find an empty slot. Lookup follows the same walk. Cache-friendly because the walk is sequential memory access — but degrades sharply past ~70% load factor: clusters of occupied slots merge into bigger clusters, dramatically increasing the average probe distance (this is "primary clustering").

Cuckoo hashing two tables, two hashes · O(1) worst-case lookup

Use two independent hash functions and two parallel tables. On insert: try table 1 at h₁(k); if occupied, evict the previous tenant, push the new key in, then re-place the evicted tenant in table 2 at h₂(k'). Repeat if that one is also occupied. Lookup is O(1) worst-case — every key sits at exactly one of two known slots — but insertion can chain through many evictions, and if the chain loops forever the table needs a rehash. Cuckoo's effective capacity caps near 50% load on its primary table.

TRY THIS

Hit "Apply" with target 70% — chaining and cuckoo handle it cleanly; probing's longest-probe metric jumps.
Push the slider to 90% and apply — probing's longest probe explodes; cuckoo will hit "INSERT FAILED" and stop.
Toggle "use bad hash" then add 16 keys — chaining piles up in one bucket, probing forms one giant cluster, cuckoo cycles forever and fails fast.

What's a "good" hash function

A good hash function spreads keys nearly uniformly across buckets. The golden-ratio multiplicative hash used here — h(k) = (k × 2654435761) mod size — has the property that consecutive keys land in widely-separated buckets, because 2654435761 ≈ 2³² × (φ − 1) is irrational-mod-anything-power-of-2. The "bad hash" toggle replaces this with k mod size, which is strict residue: consecutive keys produce consecutive bucket indices, immediately destroying the uniform-distribution property that all three strategies depend on.

What I learned writing this

Linear probing's degradation is non-linear. You can sit at 50% load with average probe ≈ 1.5; cross 70% and the average probe shoots toward 5; at 90% it's 50+. This is because clusters merge: the more occupied a region, the more likely a new collision lands in it, extending it further. The doubling-when-half-full convention has a real basis: load factor of 0.5 is the safe operating point.
Cuckoo's win is in lookups, not inserts. Insert can be slow (or fail entirely under high load), but lookup is two array accesses, period — no probing, no chain walking. In workloads where reads dominate writes and worst-case latency matters, cuckoo wins despite the insert cost.
Chaining is the most forgiving choice and the most memory-hungry. Linked-list overhead per element, poor cache locality, but it never "breaks" — load factor 2.0, 5.0, 10.0 all work, just with linearly more lookup cost.
Switching to a bad hash makes the strategy choice almost irrelevant — they all degrade. Hash quality dominates the strategy choice in practice, which is why language standard libraries (Python's dict, Java's HashMap) ship with quite sophisticated hash functions. You can't compensate for a bad hash by picking a clever resolution scheme.

← Back to all algorithms