09 · String Matching

KMP vs naive · same haystack · same needle · who finishes with fewer comparisons?

try:

What you're watching

Both algorithms search for the needle inside the haystack and report every match position. They produce identical results; they differ only in how many character comparisons they do along the way. The "comparisons" stat above each panel is the fair efficiency metric.

RECOMMENDED PRESET
"KMP advantage · sparse needle" (haystack AAAAAAAAAAAAAAAAB, needle AAAAB): the naive algorithm re-scans almost every character on every failed alignment — its comparison count blows up. KMP's failure function records that "after matching AAAA, you've already established the next 4 characters in the needle should also start with AAAA" and skips ahead efficiently. The comparison-count gap is the entire point of KMP.

The two algorithms

Naive substring search   O(n·m) worst case

For every position i in 0..n-m, try matching the needle character by character against haystack[i..i+m-1]. On any mismatch, slide the needle one position right and start over. Each haystack character can be re-examined up to m times — hence the O(nm) worst case. Easy to implement, never fast.

Knuth-Morris-Pratt (KMP)   O(n + m)

Pre-compute a failure function for the needle in O(m): failure[q] is the length of the longest proper prefix of needle[0..q] that is also a suffix of it. This encodes "if I've matched the first q+1 characters of the needle and then mismatch at q+1, how many of those q characters can I keep using as a free head-start on the next attempt?"

During matching, on a mismatch at needle position q, instead of restarting we set q = failure[q-1] and retry from there. No haystack character is ever re-examined, so total time is O(n + m).

Reading the panels

What I learned writing this

← Back to all algorithms