Count Substrings with Exactly k Distinct Characters in a String

Pratik ShahPratik Shah
4 min read

Problem Statement

Given a string of lowercase alphabets, count all possible substrings (not necessarily distinct) that have exactly k distinct characters.

Example:
Input: S = "aba", K = 2
Output: 3
Explanation: The substrings are "ab", "ba", and "aba".

  • Brute Force Approach

    In this approach we are going to use sets data structure to maintain the list which only contains distinct characters. The strategy is simple but robust: we will search for every potential substring containing exactly K different characters, counting each one as we go. By doing this, we make sure that no viable solution eludes us and we end up with a precise count of these unique substrings. We will use an unordered_set to optimize insertion time, which is O(1) on average and O(n) in the worst case.

    Solution :

      long long int substrCount (string s, int k) {
             int n = s.length() , count = 0;
             unordered_set<char>distinctChars;
              for (int i = 0; i < n; ++i) {
                  for (int j = i; j < n; ++j) {
                      distinctChars.insert(s[j]);
                      if (distinctChars.size() == k) {
                          ++count;
                      }
                      else if (distinctChars.size() > k) {
                          break;
                      }
                  }
    
                 distinctChars.clear(); 
              }
               return count ;
          }
    

    Obviously, the above solution won't pass all cases because of its O(n^2) time complexity.

    Time Complexity : O(n^2)

    Space Complexity : O(n^2)

  • Better Approach

    This approach uses a combination of a map data structure and a sliding window algorithm.

    Intuition

    We know that the total number of substrings in a window can be calculated using j - i + 1.

    If we can somehow calculate the total possible substrings where the number of distinct characters is at most K. Then by subtracting substrings where the number of distinct characters is at most K - 1. we can get the number of substrings where the distinct characters are exactly K.

    Steps to find the number of substrings with at most K distinct characters :

    1. Create variables to start the window, i.e., i = 0 and j = 0.

    2. Expand the window by mapping the frequency of the character at j with map[s[j]++.

    3. If the number of distinct characters becomes greater than K, shrink the window by moving i++ and decreasing the frequency of s[i].

    4. If the frequency of any character becomes 0, i.e., map[s[i]] == 0, then remove that character from the map.

    5. Repeat steps 3 and 4 until the number of distinct characters is greater than K, i.e., while(map.size() > k).

    6. On every iteration, simply count the total number of substrings in that window, i.e., j - i + 1.

Solution :

    long long int atMostCount(string s, int k) {
      unordered_map<char, int> fq;
      long long int i = 0, j = 0, sum = 0, n = s.size();
      while (j < n and i < n) {
        fq[s[j]]++;
        while (fq.size() > k) {
          fq[s[i]]--;
          if (fq[s[i]] == 0) {
            fq.erase(s[i]);
          }
          i++;
        }
        sum = sum + (j - i + 1);
        j++;
      }
      return sum;
    }

    long long int substrCount(string s, int k) {
      return atMostCount(s, k) - atMostCount(s, k - 1);
    }

Surprisingly, the above solution won't pass all cases becaus map.erase() may take O(n) in the worst case.

Time Complexity: O(n^2)

Space Complexity: O(1)

At most, the map will store 26 characters, i.e., O(26).

  • Optimal Approach

    In this approach, we will keep the same intuition but replace the map with a vector of size 26. And we will count the frequency and store it in a vector. Whenever we encounter a distinct character, we will simply increment a variable i.e., dist++.

      long long int atMostCount(string s , int k){
              vector<int> fq (26,0);
              long long int  i = 0 , j = 0 , sum = 0 ,  n = s.size() , dist = 0;
              while(j<n and i<n){
                  fq[s[j] - 'a']++;
                  if(fq[s[j] - 'a'] == 1) dist++;
                  while(dist>k){
                      fq[s[i] - 'a']--;
                      if(fq[s[i] - 'a'] == 0){
                          dist-- ;
                      }
                      i++;
                  }
                  sum = sum+(j- i + 1);
                  j++;
              }
              return sum ;
          }
    
          long long int substrCount (string s, int k) {
              return atMostCount(s,k) - atMostCount(s,k -1) ;
          }
    

    This optimal approach works for all test cases.

    Time Complexity: O(n)

    Space Complexity: O(1)

    The vector of size 26 is used, so it takes constant space, i.e., O(26).

  • Conclusion

    In this article, we address the problem of counting substrings with exactly k distinct characters in a given string. We first explore a brute force approach with a time complexity of O(n^2), followed by a more efficient sliding window technique using a hashmap. Despite improvements, this method has limitations, leading us to an optimal solution that leverages a fixed-size vector for character frequencies, achieving a linear time complexity of O(n) while ensuring space efficiency.

    Thank you for reading! ๐Ÿ™โœจ

20
Subscribe to my newsletter

Read articles from Pratik Shah directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Pratik Shah
Pratik Shah