MLflow for Beginners: Track Machine Learning Experiments Like a Pro

Introduction:

This blog draws inspiration from the excellent MLflow tutorial by CodeBasics, which clearly demonstrates the core concepts we will be discussing here. If you are looking for a more detailed or visual walk through, I highly recommend checking it out.

This post is written from the perspective of a beginner and aims to offer a more hands-on, less theoretical explanation of MLflow, based on my own experience implementing it in a real project. Think of it as a beginner learning out loud.

What is MLflow?

MLflow is an open-source MLOps tool that helps you track, log, and manage everything involved in machine learning experiments including metrics, parameters, models, and other useful artifacts.

It is especially useful when you are trying out different models or hyper parameters and need a way to compare them without losing track. Instead of manually noting results, MLflow stores everything in one place, helping you stay organized and productive.

For example, in my Titanic survival prediction project, I used MLflow to log different models such as Logistic Regression and Random Forest, track accuracy scores, and compare results using the MLflow UI. This made it easy to identify the best-performing model for deployment.

How to install and setup MLflow?

It is actually pretty straightforward to install into MLflow into your python notebook.Just:

pip install mlflow

Now to setup MLflow we will be using my titanic-model code as a reference which can be located from my GitHub repository.For this example, I had trained and compared three different models: Logistic Regression, Random Forest, and Gradient Boosting Classifier.

Here's a simplified version of the model setup:

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier

models = [
    (
        "Logistic Regression", 
        {
            "class_weight": None,
            "random_state": 8888,
            "solver": "lbfgs",
            "max_iter": 100
        },
        LogisticRegression(), 
    ),
    (
        "Random Forest", 
        {
            "n_estimators": 100,
            "random_state": 42
        },
        RandomForestClassifier(), 
    ),
    (
        "Gradient Boosting Classifier", 
        {
            "n_estimators": 100,
            "learning_rate": 1.0,
            "max_depth": 1,
            "random_state": 0
        },
        GradientBoostingClassifier(), 
    ),
]

Now we integrate MLflow to log the model parameters and performance metrics for each run.

import mlflow
import mlflow.sklearn

# Set the tracking URI (for local use)
mlflow.set_tracking_uri("http://localhost:5000")

# Set the experiment name
mlflow.set_experiment("Accuracy Model v3")

for i, element in enumerate(models):
    model_name = element[0]
    params = element[1]
    model = element[2]
    report = reports[i]  # Assume this contains the classification report for the model

    with mlflow.start_run(run_name=model_name):
        mlflow.log_params(params)
        mlflow.log_metrics({
            'accuracy': report['accuracy'],
            'recall_class_1': report['1']['recall'],
            'recall_class_0': report['0']['recall'],
            'f1_score_macro': report['macro avg']['f1-score']
        })

        # Log the trained model
        mlflow.sklearn.log_model(model, "model", registered_model_name=model_name)

To view your runs visually, launch the MLflow tracking UI with:

mlflow ui

Then open your browser at http://127.0.0.1:5000. You will see a dashboard that shows all your logged experiments, parameters, metrics, and models.

Logging Parameters, Metrics, and Models

MLflow provides a clean API to track everything that matters in your experiments.

Here are the three main functions you will use:

Logging Parameters

mlflow.log_param("learning_rate", 0.1)
mlflow.log_param("n_estimators", 100)

These are the hyper parameters of your model that you might want to compare across runs.

Logging Metrics

 mlflow.log_metric("accuracy", 0.875)

You can log accuracy, precision, recall, or any other custom metric you calculate.

Logging the Model

mlflow.sklearn.log_model(model, "model")

This saves your trained model in a format you can reload or even serve later.

What Gets Stored and Where?

Once you run your experiment, MLflow stores everything inside a folder called mlruns in your project directory. Each experiment is given a unique ID, and inside that folder, you will find:

A params folder containing your logged parameters
A metrics folder for evaluation scores
An artifacts folder which includes the saved model

You can compare runs side by side in the UI, which makes model selection and debugging much easier.These all can be then seen on the web application and MLflow will do most of the heavy lifting with management and data consolidation.Feel free to mess around with the UI and draw your own conclusions.

Things I Learned (or Struggled With)

Like any beginner working with a new tool, I ran into a few bumps while setting up MLflow and learned a lot in the process.

Confusion About MLflow Runs

At first, I didn’t fully understand what a “run” was in MLflow. I was running multiple models, but everything was showing up under the same run or being overwritten. I realized I had forgotten to call mlflow.start_run() with a unique run_name.This could possibly fix it for you:

with mlflow.start_run(run_name="Gradient Boosting"):
    ...

MLflow UI Not Launching

I tried running mlflow ui and nothing happened. It turned out I had a port conflict on 5000, as another local server was already running.But it still wouldn’t run after making these changes.Using this command helped me launched it for the first time after which it launched normally using mlflow ui.

mlflow server \
  --backend-store-uri sqlite:///mlflow.db \
  --default-artifact-root ./artifacts \
  --host 0.0.0.0 \
  --port 5000

Then you should be able to access everything as normal.

Next Steps

Now that I’ve successfully set up MLflow and tracked multiple models in my Titanic prediction project, here’s what I plan to learn next:

DVC (Data Version Control): To manage data and model files across versions and collaborators.
Prefect: To automate my training pipeline and possibly run it on a schedule.

I’m also working on a blog series covering each of these topics as I learn them.

Want to Follow Along?
You can check out my code and future updates here:
https://github.com/Labreo/titanic-ml
I also post progress daily on Twitter:
https://x.com/Kaker0th

If you're just starting with MLOps too, feel free to reach out or share what you're building. Let's learn and grow together!

Tracking ML Experiments with MLflow: A Simple Guide for Beginners

Table of contents