🚀 "Mastering Rabin–Karp: The Rolling Hash Trick Every C++ Coder Must Know"

Rabin-Karp Algorithm: String Matching with Rolling Hash

String matching is one of the fundamental problems in Data Structures and Algorithms (DSA).
Among the most famous algorithms for pattern searching is the Rabin-Karp Algorithm, which uses hashing to efficiently find all occurrences of a pattern inside a text.


🔹 Theory & Approach

The Rabin-Karp algorithm solves the substring search problem:

Given a text T of length n and a pattern P of length m, find all the occurrences of P in T.

Instead of checking each substring of the text directly (which would be slow), Rabin-Karp uses a hash function to convert strings into numbers. This allows us to quickly compare numbers instead of comparing full substrings.

Key Idea:

  1. Compute the hash value of the pattern P.

  2. Compute the hash value of the first window of size m in T.

  3. Slide the window one character at a time:

    • Update the hash value in O(1) using a rolling hash formula.

    • If the hash values match, do a final character check (to avoid collisions).

  4. Continue until the end of the text.


🔹 Visual Explanation

Imagine you have:

Text: A B A C A B A

Pattern: A B A

  1. Compute hash of "ABA" → say it is 123.

  2. Compute hash of the first substring "ABA" of text → 123. ✅ Match found.

  3. Slide the window to "BAC" → hash 456123. ❌ Not a match.

  4. Slide to "ACA", "CAB", and so on until the end.

👉 The rolling hash lets us compute the next hash without recomputing the whole substring.
For example: NewHash = (OldHash - FirstChar Base^(m-1)) Base + NewChar


🔹 Pseudocode

patternHash = hash(pattern)

textHash = hash(first m characters of text)

for i in 0 to n - m:

if patternHash == textHash:

if substring(text, i, m) == pattern:

print "Pattern found at index i"

# Slide window: compute new hash

if i < n - m:

textHash = roll(textHash, text[i], text[i+m])


🔹 C++ Implementation

#include <bits/stdc++.h>
using namespace std;

// Function to perform Rabin-Karp pattern search
void rabinKarp(string text, string pattern) {
    int n = text.size();
    int m = pattern.size();
    int base = 256;         // Number of possible characters
    int mod = 101;          // A prime number (to reduce collisions)

    int patternHash = 0;    // Hash of the pattern
    int textHash = 0;       // Hash of current text window
    int h = 1;              // base^(m-1) % mod

    // Precompute h = pow(base, m-1) % mod
    for (int i = 0; i < m - 1; i++)
        h = (h * base) % mod;

    // Compute initial hash of pattern and first window
    for (int i = 0; i < m; i++) {
        patternHash = (base * patternHash + pattern[i]) % mod;
        textHash = (base * textHash + text[i]) % mod;
    }

    // Slide the window over the text
    for (int i = 0; i <= n - m; i++) {
        // If hash values match, check characters one by one
        if (patternHash == textHash) {
            bool match = true;
            for (int j = 0; j < m; j++) {
                if (text[i + j] != pattern[j]) {
                    match = false;
                    break;
                }
            }
            if (match)
                cout << "Pattern found at index " << i << endl;
        }

        // Compute hash for next window
        if (i < n - m) {
            textHash = (base * (textHash - text[i] * h) + text[i + m]) % mod;
            if (textHash < 0)
                textHash = (textHash + mod);
        }
    }
}

// Driver code
int main() {
    string text = "ABABACABA";
    string pattern = "ABA";

    cout << "Text: " << text << endl;
    cout << "Pattern: " << pattern << endl;

    rabinKarp(text, pattern);

    return 0;
}

🔹 Time and Space Complexity

  • Average Case Time Complexity: O(n + m)

  • Worst Case Time Complexity: O(n * m) (when many hash collisions occur)

  • Space Complexity: O(1) (constant extra space, apart from input storage)


Here are some famous problems where Rabin-Karp or hashing-based string search can be applied:

  1. LeetCode 28: Find the Index of the First Occurrence in a String

  2. LeetCode 187: Repeated DNA Sequences

  3. LeetCode 1044: Longest Duplicate Substring

  4. GFG: Pattern Searching

  5. Problems from Blind 75 / NeetCode 150 / Striver’s A2Z DSA Sheet under Strings.


🔹 GitHub Repository

👉 Check the full solutions on my GitHub Repository

https://github.com/UmairDevloper/DSA-in-C-.git


🔹 Tags

#DSA #Algorithms #RabinKarp #StringMatching #Hashing #C++ #LeetCode #Blind75 #NeetCode #Hashnode #GitHub

0
Subscribe to my newsletter

Read articles from Muhammad Umair Ullah directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Muhammad Umair Ullah
Muhammad Umair Ullah