"KMP Algorithm – The Secret Behind Efficient String Search"

🚀 Mastering KMP (Knuth–Morris–Pratt) String Matching Algorithm


📌 Introduction

When working with text processing, one of the most common problems is finding a substring (pattern) inside a larger string (text). The naive approach checks all possibilities, but this can take a lot of time for large strings.

The Knuth–Morris–Pratt (KMP) algorithm is a powerful solution that avoids unnecessary comparisons. It preprocesses the pattern to build a prefix table (LPS array) that helps skip repeated work, achieving linear time complexity.


📝 Problem Statement

Given two strings:

  • A text string T of length n.

  • A pattern string P of length m.

👉 Determine whether the pattern P occurs in T. If yes, return the starting index/indices.


🔍 Concept (with Visualization)

The naive algorithm re-checks characters after every mismatch:

Text:    A B A C A B A B A
Pattern: A B A B

At index 0, it matches A B A, then mismatches on the C.
The naive method goes back and starts again at index 1.
This causes repeated work. ❌

KMP optimization:
KMP builds an LPS (Longest Prefix Suffix) array for the pattern.
The LPS tells us where to resume in the pattern instead of starting all over again.

Example LPS for pattern "ABABAC":

PatternABABAC
LPS001230

That means:

  • After mismatch at C, instead of starting over, jump to index 3 in the pattern (skipping redundant checks).

Thus, KMP ensures O(n + m) complexity compared to O(n × m) for the naive approach.


✅ Example

Input:

Text    = "ABABABCABABABCABAB"
Pattern = "ABABAC"

Process (with LPS):

  • Match fails at index 5, but instead of restarting, KMP uses LPS to skip ahead.

Output:

Pattern found at index 0
Pattern found at index 7
Pattern found at index 12

💻 C++ Implementation

#include <bits/stdc++.h>
using namespace std;

// Function to build the LPS array
vector<int> computeLPS(string pattern) {
    int m = pattern.size();
    vector<int> lps(m, 0);

    int len = 0; // length of previous longest prefix suffix
    int i = 1;

    while (i < m) {
        if (pattern[i] == pattern[len]) {
            len++;
            lps[i] = len;
            i++;
        } else {
            if (len != 0) {
                len = lps[len - 1]; // fallback
            } else {
                lps[i] = 0;
                i++;
            }
        }
    }
    return lps;
}

// KMP algorithm function
void KMP(string text, string pattern) {
    int n = text.size();
    int m = pattern.size();

    vector<int> lps = computeLPS(pattern);

    int i = 0; // index for text
    int j = 0; // index for pattern

    while (i < n) {
        if (pattern[j] == text[i]) {
            i++;
            j++;
        }

        if (j == m) {
            cout << "Pattern found at index " << i - j << endl;
            j = lps[j - 1];
        } else if (i < n && pattern[j] != text[i]) {
            if (j != 0)
                j = lps[j - 1];
            else
                i++;
        }
    }
}

int main() {
    string text = "ABABABCABABABCABAB";
    string pattern = "ABABAC";

    cout << "Text: " << text << endl;
    cout << "Pattern: " << pattern << endl;
    KMP(text, pattern);

    return 0;
}

⚖️ Time & Space Complexity

  • Time Complexity:

    • Preprocessing LPS: O(m)

    • Searching: O(n)

    • Total: O(n + m)

  • Space Complexity:

    • LPS array: O(m)

✅ Advantages

  • Linear time complexity.

  • Efficient for multiple pattern searches.

  • No backtracking in the text.

❌ Disadvantages

  • Slightly complex to implement compared to naive search.

  • Needs preprocessing (LPS) even for one-time searches.


  • LeetCode 28: Implement strStr()

  • LeetCode 686: Repeated String Match

  • Pattern Searching in large DNA sequences

  • Detecting plagiarism in text


🔗 GitHub Repository

👉 My GitHub Repository Link ( Check Out )


🏷️ Tags

Algorithms C++ String Matching KMP DSA Competitive Programming LeetCode

0
Subscribe to my newsletter

Read articles from Muhammad Umair Ullah directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Muhammad Umair Ullah
Muhammad Umair Ullah