MLFlow Registry

Vidhya dharanVidhya dharan
4 min read

In continuation to our previous image compression blog, now we will aim to learn core concepts of MLFlow for deployment.

In the world of machine learning, managing models efficiently is crucial for ensuring smooth deployment and updates. One way to handle this is through the MLflow Model Registry, a centralized store with a set of APIs and a user interface (UI) to manage the full lifecycle of your ML models.

The registry not only stores your models but also keeps track of their lineage (i.e., which experiment and run produced the model), offers version control, tagging, aliasing, and allows for annotations, making collaboration and production much easier.

Why a registry?

Imagine you're a machine learning engineer at a tech company, tasked with building and managing models. Your manager asks you to create a system where you train models and store them in a centralized repository. This repository will help your DevOps team easily retrieve the models for deployment into production, ensuring a smooth workflow from training to deployment.

This is where MLflow’s model registry comes useful. It provides you with a reliable and centralized location to store your models, ensuring easy access and version control, all while allowing you to track the model's lifecycle: from experimentation to production.

Building and Tracking a Model

Consider the task of developing a random forest classifier model. In this case, we are building a model to classify different types of iris flowers based on their features. To begin, we’ll define some hyperparameters for our random forest model:

n_estimators = 10
max_depth = 5

Once the model is ready, we can start tracking it using MLflow Tracking during the training process. MLflow will record essential metrics, parameters, and outputs, allowing us to monitor the model's progress. Here's a code snippet to demonstrate tracking:

with mlflow.start_run(experiment_id=experiment_id) as run:
    model = RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth
    )
    model.fit(X=X_train, y=y_train)
    predictions = model.predict(X=X_test)

We use mlflow.start_run to track the entire process, from training to making predictions. This provides valuable insights into the model's performance.

Registering a Model in MLflow

Once the model has been successfully trained and tracked, the next step is to store it in a centralized repository. The MLflow Model Registry serves this purpose by allowing you to register and version your models.


client.create_registered_model(name=registered_model_name)
client.create_model_version(
    name=registered_model_name,
    source=f"runs:/{run_id}/{log_model_path}",
    run_id=run_id
)

By running this code, we create a registered model under a specific name, and a new version of the model is assigned. Typically, the first version is labeled as "version 1" in the registry, and you can view this visually in the MLflow UI.

Updating Model version

Once the initial model is deployed, you might need to update it. For example, if there’s a bug in your model or if you have improved its performance, MLflow Model Registry’s versioning capabilities allow you to update the model seamlessly. The process is simple and ensures that your DevOps team can always retrieve the latest model version for deployment.

client.create_model_version(
    name=registered_model_name,
    source=f"runs:/{new_run_id}/{log_model_path}",
    run_id=new_run_id
)

This code snippet registers a new version of the same model with updated parameters, ensuring that the latest version is always available for production. You can track all the versions through the MLflow UI, which makes it easy to switch between different model versions or roll back if needed.

Centralised model storage

Let’s use Wine dataset.

import mlflow
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_wine
from sklearn.metrics import accuracy_score

wine = load_wine()
X = wine.data
y = wine.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

n_estimators = 15
max_depth = 6

experiment_name = "Wine"
mlflow.set_experiment(experiment_name)

with mlflow.start_run() as run:
    model = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth)
    model.fit(X_train, y_train)
    predictions = model.predict(X_test)
    accuracy = accuracy_score(y_test, predictions)

    mlflow.log_param("n_estimators", n_estimators)
    mlflow.log_param("max_depth", max_depth)
    mlflow.log_metric("accuracy", accuracy)

    model_path = "random_forest_model"
    mlflow.sklearn.log_model(model, model_path)

    run_id = run.info.run_id
    experiment_id = run.info.experiment_id

client = mlflow.tracking.MlflowClient()

registered_model_name = "WineClassifierModel"
client.create_registered_model(name=registered_model_name)
client.create_model_version(name=registered_model_name, source=f"runs:/{run_id}/{model_path}", run_id=run_id)

This code achieves the following:

  • Loads the Wine dataset.

  • Trains a RandomForestClassifier.

  • Tracks parameters and accuracy using MLflow.

  • Logs the trained model.

  • Registers the model into MLflow’s model registry with versioning.

By integrating MLflow's model registry into your workflow, you can ensure that your models are always production-ready, enabling seamless collaboration between engineers and DevOps teams. With this setup, you can focus more on building better models, knowing that their deployment and management are handled smoothly.

Github repo : https://github.com/VidhyadharanSS/MLFlow

0
Subscribe to my newsletter

Read articles from Vidhya dharan directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Vidhya dharan
Vidhya dharan

"If I had my life to live over again, I would have made a rule to read some poetry and listen to some music at least once every week." - Charles Darwin.