"KMP Algorithm – The Secret Behind Efficient String Search"


🚀 Mastering KMP (Knuth–Morris–Pratt) String Matching Algorithm
📌 Introduction
When working with text processing, one of the most common problems is finding a substring (pattern) inside a larger string (text). The naive approach checks all possibilities, but this can take a lot of time for large strings.
The Knuth–Morris–Pratt (KMP) algorithm is a powerful solution that avoids unnecessary comparisons. It preprocesses the pattern to build a prefix table (LPS array) that helps skip repeated work, achieving linear time complexity.
📝 Problem Statement
Given two strings:
A text string
T
of lengthn
.A pattern string
P
of lengthm
.
👉 Determine whether the pattern P
occurs in T
. If yes, return the starting index/indices.
🔍 Concept (with Visualization)
The naive algorithm re-checks characters after every mismatch:
Text: A B A C A B A B A
Pattern: A B A B
At index 0
, it matches A B A
, then mismatches on the C
.
The naive method goes back and starts again at index 1
.
This causes repeated work. ❌
✅ KMP optimization:
KMP builds an LPS (Longest Prefix Suffix) array for the pattern.
The LPS tells us where to resume in the pattern instead of starting all over again.
Example LPS for pattern "ABABAC"
:
Pattern | A | B | A | B | A | C |
LPS | 0 | 0 | 1 | 2 | 3 | 0 |
That means:
- After mismatch at
C
, instead of starting over, jump to index3
in the pattern (skipping redundant checks).
Thus, KMP ensures O(n + m) complexity compared to O(n × m) for the naive approach.
✅ Example
Input:
Text = "ABABABCABABABCABAB"
Pattern = "ABABAC"
Process (with LPS):
- Match fails at index 5, but instead of restarting, KMP uses LPS to skip ahead.
Output:
Pattern found at index 0
Pattern found at index 7
Pattern found at index 12
💻 C++ Implementation
#include <bits/stdc++.h>
using namespace std;
// Function to build the LPS array
vector<int> computeLPS(string pattern) {
int m = pattern.size();
vector<int> lps(m, 0);
int len = 0; // length of previous longest prefix suffix
int i = 1;
while (i < m) {
if (pattern[i] == pattern[len]) {
len++;
lps[i] = len;
i++;
} else {
if (len != 0) {
len = lps[len - 1]; // fallback
} else {
lps[i] = 0;
i++;
}
}
}
return lps;
}
// KMP algorithm function
void KMP(string text, string pattern) {
int n = text.size();
int m = pattern.size();
vector<int> lps = computeLPS(pattern);
int i = 0; // index for text
int j = 0; // index for pattern
while (i < n) {
if (pattern[j] == text[i]) {
i++;
j++;
}
if (j == m) {
cout << "Pattern found at index " << i - j << endl;
j = lps[j - 1];
} else if (i < n && pattern[j] != text[i]) {
if (j != 0)
j = lps[j - 1];
else
i++;
}
}
}
int main() {
string text = "ABABABCABABABCABAB";
string pattern = "ABABAC";
cout << "Text: " << text << endl;
cout << "Pattern: " << pattern << endl;
KMP(text, pattern);
return 0;
}
⚖️ Time & Space Complexity
Time Complexity:
Preprocessing LPS: O(m)
Searching: O(n)
Total: O(n + m)
Space Complexity:
- LPS array: O(m)
✅ Advantages
Linear time complexity.
Efficient for multiple pattern searches.
No backtracking in the text.
❌ Disadvantages
Slightly complex to implement compared to naive search.
Needs preprocessing (LPS) even for one-time searches.
📚 Related Problems (LeetCode & CP)
LeetCode 28: Implement
strStr()
LeetCode 686: Repeated String Match
Pattern Searching in large DNA sequences
Detecting plagiarism in text
🔗 GitHub Repository
👉 My GitHub Repository Link ( Check Out )
🏷️ Tags
Algorithms
C++
String Matching
KMP
DSA
Competitive Programming
LeetCode
Subscribe to my newsletter
Read articles from Muhammad Umair Ullah directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
