Machine learning models are the core building blocks of artificial intelligence. As of this writing, a popular AI chatbot circulating in the media and tech industry is ChatGPT. It uses several large generative language models under the hood and can perform tasks that some might describe as super-human.

This advancement in AI showcases the potential of machine learning models and their transformative impact. With the number of machine learning libraries available on the internet, you can even develop your custom models. What's even better is that you can decouple the features of your model and control how they behave using feature flags.

Using feature flags with machine learning models

A feature flag can be thought of as a toggle switch for a feature it is linked to. Using a conditional if-statement in your code, you can decide to render this feature if its flag is toggled on or choose to hide it otherwise if it is toggled off. However, feature flags aren't solely used for showing or hiding features from users. They can also be configured to augment various processes, including Canary Releases, Progressive Deployments, A/B Testing, DevOps, and CI/CD pipelines, to name a few.

Feature flags can bring these benefits to your machine-learning models as well. In the following section, I'll show you how to create two machine-learning models for classifying text using Python, as it is a widely used programming language for developing machine-learning models. Then, I'll use a feature flag to control what model to use for the classification. For this, I'll use ConfigCat. ConfigCat is a feature flag service with unlimited team size, awesome support, and a reasonable price tag that takes 10 mins to learn. Using their extensive list of SDKs, you can easily integrate feature flags in just about any programming language and technology. Let's dive in!

Guide: How to use feature flags with machine learning models

Using a popular machine learning library called SpaCy, I'll create two text classification models. One will serve as a base model, and the other as a pro model. These can typically fit into a scenario where you only need to give paid users access to the pro model while other users use the base model.

Both models take a sentence or a piece of text as input and classify it to an intent based on the training data each model was trained on. Here are the differences between the two models and the intents that they are capable of classifying text to.

Base model: goodbye, greeting, business_hours
Pro model: goodbye, greeting, business_hours, payment_methods

The key difference is that the pro model is capable of classifying a text such as:

"Can I pay you with my Google Pay wallet?" to payment_methods, whereas the base model would not.

Setup of a demo application

I've created a sample app to demonstrate how a feature flag can be used to control which model classifies the text entered by a user.

In the app.py file, I've imported the resources needed to build and test each model. This is followed by two functions: train_classifier(model_type) and classify_text(text, model_type). The train_classifier function takes model_type as a parameter (either 'pro' or 'base'), and the classify_text takes the text to classify and the model_type to use to perform the classification.

# app.py
import pandas as pd
from sklearn.svm import LinearSVC
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
import pickle5 as pickle

def train_classifier(model_type):
    # Load the training data
    df = pd.read_json('./models/{0}/classification_data.json'.format(model_type))
    X_train, X_test, y_train, y_test = train_test_split(df['text'], df['intent'], random_state=0)

    # Train the model
    count_vect = CountVectorizer()
    X_train_counts = count_vect.fit_transform(X_train)
    tfidf_transformer = TfidfTransformer()
    X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)

    model = LinearSVC(dual="auto").fit(X_train_tfidf, y_train)

    # Save the vectorizer 
    vec_file = './models/{0}/vectorizer.pickle'.format(model_type)
    pickle.dump(count_vect, open(vec_file, 'wb'))

    # Save the model
    mod_file = './models/{0}/classifier.model'.format(model_type)
    pickle.dump(model, open(mod_file, 'wb'))

    # Print training complete to the terminal
    print("Training completed for {0} model".format(model_type))


def classify_text(text, model_type):
    # Load the vectorizer
    vec_file = './models/{0}/vectorizer.pickle'.format(model_type)
    count_vect = pickle.load(open(vec_file, 'rb'))

    # Load the model
    mod_file = './models/{0}/classifier.model'.format(model_type)
    model = pickle.load(open(mod_file, 'rb'))

    # Classify the text
    text_counts = count_vect.transform([text])
    predicted = model.predict(text_counts)
    scores = model.decision_function(text_counts)

    result = {
        'model': model_type,
        'intent': predicted[0],
        'confidence': max(scores[0]),
        'text': text
    }

    return result

In the interact.py file, I've imported both the train_classifier and classify_text functions from app.py and added a test_query() function, which takes the text entered by the user and passes it to the classify_text function, printing the result to the terminal.

The second function, run_app(), performs the command line interaction. Finally, a while loop was added at the bottom to keep the run_app() function active until the user exits the interaction.

# interact.py
from app import train_classifier, classify_text

def test_query():
    text = input("Hi I'm a chatbot. Lets talk: ")
    result = classify_text(text, 'pro')
    print(result)

def run_app():
    option = input("1 - Interact | 2 - Train | 3 - Exit: ")

    if option == "1":
        test_query()
    elif option == "2":
        model = input("Model to train: 1 - Base | 2 - Pro: ")
        selected_model = 'base' if model == '1' else 'pro'
        train_classifier(selected_model)
    elif option == "3":
        return False

    return True

while True:
    if not run_app():
        break

Adding a feature flag

With the help of a feature flag, we can control which model can perform the text classification. One exciting benefit of feature flags in such a use-case scenario is that you can do this remotely without redeploying your application.

To create a feature flag, sign up for a free ConfigCat account.
In the dashboard, create a feature flag with the following details:

Name: Pro Model
Key: proModel
Hint: When on, the pro classifier model will be used

To use the feature flag, I'll install the ConfigCat SDK for Python. The SDK is used to connect the app to ConfigCat and download the latest value of the feature flag.

pip install configcat-client

Import the ConfigCat SDK in interact.py

import configcatclient

Create the ConfigCat client and pass in your SDK key:

configcat_client = configcatclient.get(
    'YOUR-CONFIGCAT-SDK-KEY' # Replace with your SDK key
)

I'll modify the test_query() function in interact.py to only use the pro model for making a classification when the feature flag is on and the base model when it is off:

from app import train_classifier, classify_text
import configcatclient # Import the ConfigCat SDK

configcat_client = configcatclient.get(
    'YOUR-CONFIGCAT-SDK-KEY', # Replace with your SDK key
)

def test_query():
    text = input("Hi I'm a chatbot. Lets talk: ")
    # Get the value of the proModel feature flag
    proModel = configcat_client.get_value('proModel', False)
    # Use the pro model if the feature flag is on
    model = 'pro' if proModel else 'base'
    result = classify_text(text, model)
    print(result)

# ...

Running the app

Before testing the app and the feature flag, you'll need to ensure that you have the following installed:

Prerequisites

Steps

Clone the repository

git clone git@github.com:configcat-labs/feature-flags-with-ml-models-sample.git

Create and activate a virtual environment

python3 -m venv venv

source venv/bin/activate

Install dependencies and run the app

pip3 install -r requirements.txt

python3 interact.py

Train both base and pro models by entering the appropriate option in the command line. For this example, I'll train the base model first.

1 - Interact | 2 - Train | 3 - Exit: 2

Model to train: 1 - Base | 2 - Pro: 1

Training completed for base model

Train the pro model.

1 - Interact | 2 - Train | 3 - Exit: 2

Model to train: 1 - Base | 2 - Pro: 2

Training completed for pro model

Now, copy your SDK key from the ConfigCat dashboard and paste it into the YOUR-CONFIGCAT-SDK-KEY placeholder in the interact.py file.

Did it work

Head over to CC and enable the feature flag.
Run the app again and enter a text to classify. The output should show that the pro model is used for classifying the text:


1 - Interact | 2 - Train | 3 - Exit: 1

Hi, I'm a chatbot. Let's talk: can I pay you with my Google Pay wallet?

{'model': 'pro', 'intent': 'payment_methods', 'confidence': 2.0369560833357596, 'text': 'Can I pay you with my Google Pay wallet?'}

Disable the feature flag. You should now see that the base model is used to classify the text.

The ConfigCat SDK downloads and stores the latest values automatically every 60 seconds. You can change this default behavior. Check out the ConfigCat SDK Docs to learn more.

1 - Interact | 2 - Train | 3 - Exit: 1

Hi, I'm a chatbot. Let's talk: can I pay you with my Google Pay wallet?

{'model': 'base', 'intent': 'greeting', 'confidence': 0.32370119051141377, 'text': 'Can I pay you with my Google Pay wallet?'}

From the above, you can see that the feature flag was used to control which model to use in classifying the text. When the flag was off, the base model was used, and the result intent was greeting, which is an incorrect classification. When the flag was on, the pro model was used, and the result intent was payment_methods, which is the correct classification.

The future of feature flags and machine learning models

With machine learning models becoming increasingly popular, I believe that feature flags will play a significant role in how they are used. Feature flags can be used to control the behavior of these models, what data to use when training a model, and even what model to use for a specific task. I'm excited to see how feature flags will be used with machine learning models in the future.

If you want to try what you've learned, check out ConfigCat. They have a forever-free plan that you can use to get started. You can also check out the ConfigCat Python SDK Docs to dive deeper into using feature flags with Python. Deploy any time, release when confident.

Stay connected to ConfigCat on Facebook, X (Twitter), LinkedIn, and GitHub.

Using Feature Flags with Machine Learning Models