Count k Distinct Substrings

Problem Statement

Given a string of lowercase alphabets, count all possible substrings (not necessarily distinct) that have exactly k distinct characters.

Example:
Input: S = "aba", K = 2
Output: 3
Explanation: The substrings are "ab", "ba", and "aba".

Brute Force Approach

In this approach we are going to use sets data structure to maintain the list which only contains distinct characters. The strategy is simple but robust: we will search for every potential substring containing exactly K different characters, counting each one as we go. By doing this, we make sure that no viable solution eludes us and we end up with a precise count of these unique substrings. We will use an unordered_set to optimize insertion time, which is O(1) on average and O(n) in the worst case.

Solution :

  long long int substrCount (string s, int k) {
         int n = s.length() , count = 0;
         unordered_set<char>distinctChars;
          for (int i = 0; i < n; ++i) {
              for (int j = i; j < n; ++j) {
                  distinctChars.insert(s[j]);
                  if (distinctChars.size() == k) {
                      ++count;
                  }
                  else if (distinctChars.size() > k) {
                      break;
                  }
              }

             distinctChars.clear(); 
          }
           return count ;
      }

Obviously, the above solution won't pass all cases because of its O(n^2) time complexity.

Time Complexity : O(n^2)

Space Complexity : O(n^2)

Better Approach

This approach uses a combination of a map data structure and a sliding window algorithm.

Intuition

We know that the total number of substrings in a window can be calculated using j - i + 1.

If we can somehow calculate the total possible substrings where the number of distinct characters is at most K. Then by subtracting substrings where the number of distinct characters is at most K - 1. we can get the number of substrings where the distinct characters are exactly K.

Steps to find the number of substrings with at most K distinct characters :
1. Create variables to start the window, i.e., i = 0 and j = 0.
2. Expand the window by mapping the frequency of the character at j with map[s[j]++.
3. If the number of distinct characters becomes greater than K, shrink the window by moving i++ and decreasing the frequency of s[i].
4. If the frequency of any character becomes 0, i.e., map[s[i]] == 0, then remove that character from the map.
5. Repeat steps 3 and 4 until the number of distinct characters is greater than K, i.e., while(map.size() > k).
6. On every iteration, simply count the total number of substrings in that window, i.e., j - i + 1.

Solution :

    long long int atMostCount(string s, int k) {
      unordered_map<char, int> fq;
      long long int i = 0, j = 0, sum = 0, n = s.size();
      while (j < n and i < n) {
        fq[s[j]]++;
        while (fq.size() > k) {
          fq[s[i]]--;
          if (fq[s[i]] == 0) {
            fq.erase(s[i]);
          }
          i++;
        }
        sum = sum + (j - i + 1);
        j++;
      }
      return sum;
    }

    long long int substrCount(string s, int k) {
      return atMostCount(s, k) - atMostCount(s, k - 1);
    }

Surprisingly, the above solution won't pass all cases becaus map.erase() may take O(n) in the worst case.

Time Complexity: O(n^2)

Space Complexity: O(1)

At most, the map will store 26 characters, i.e., O(26).

Optimal Approach

In this approach, we will keep the same intuition but replace the map with a vector of size 26. And we will count the frequency and store it in a vector. Whenever we encounter a distinct character, we will simply increment a variable i.e., dist++.

  long long int atMostCount(string s , int k){
          vector<int> fq (26,0);
          long long int  i = 0 , j = 0 , sum = 0 ,  n = s.size() , dist = 0;
          while(j<n and i<n){
              fq[s[j] - 'a']++;
              if(fq[s[j] - 'a'] == 1) dist++;
              while(dist>k){
                  fq[s[i] - 'a']--;
                  if(fq[s[i] - 'a'] == 0){
                      dist-- ;
                  }
                  i++;
              }
              sum = sum+(j- i + 1);
              j++;
          }
          return sum ;
      }

      long long int substrCount (string s, int k) {
          return atMostCount(s,k) - atMostCount(s,k -1) ;
      }

This optimal approach works for all test cases.

Time Complexity: O(n)

Space Complexity: O(1)

The vector of size 26 is used, so it takes constant space, i.e., O(26).

Conclusion

In this article, we address the problem of counting substrings with exactly k distinct characters in a given string. We first explore a brute force approach with a time complexity of O(n^2), followed by a more efficient sliding window technique using a hashmap. Despite improvements, this method has limitations, leading us to an optimal solution that leverages a fixed-size vector for character frequencies, achieving a linear time complexity of O(n) while ensuring space efficiency.

Thank you for reading! 🙏✨

Count Substrings with Exactly k Distinct Characters in a String

Problem Statement

Subscribe to my newsletter

Pratik Shah

Pratik Shah