09 Model Evaluation & Dagshub

Yash MainiYash Maini
4 min read

Model Evaluation with MLflow & Dagshub: The Real-World Teenage ML Engineer's Playbook

Dagshub: The “GitHub for Data Science” That Actually Gets It

Alright, first up-let’s talk about Dagshub. If you’re tired of losing track of your models, datasets, or just want to flex your ML workflow to your friends (or future employers), Dagshub is your new best friend.

What’s cool about Dagshub?

  • It’s like GitHub, but made for data science. You can track your code, data, models, and even all your experiment results.

  • It works with MLflow, so you get a slick dashboard to compare how different models perform.

  • Collaboration is a breeze. You can literally share your whole ML pipeline with your squad or mentor.

Honestly, once you use it, you’ll wonder how you ever managed with just folders named “final_model_v3_really_final”.

Setting Up the Model Evaluation: No More Guesswork

You know how annoying it is when you can’t remember which model you trained with which data? That’s why we use configuration files. Here’s the deal: all the important paths and filenames go into config.yaml.

model_evaluation:
  root_dir: artifacts/model_evaluation
  test_data_path: artifacts/data_transformation/test.csv
  model_path: artifacts/model_trainer/model.joblib
  metric_file_name: artifacts/model_evaluation/metrics.json

In Python, we keep things tight with a dataclass. It’s like a cheat code for keeping your configs organized:

pythonfrom dataclasses import dataclass
from pathlib import Path

@dataclass
class ModelEvaluationConfig:
    root_dir: Path
    test_data_path: Path
    model_path: Path
    all_params: dict
    metric_file_name: Path
    target_column: str
    mlflow_uri: str

And to make sure we always have the right config, we use a function that grabs everything from the YAML and puts it into this dataclass. No more “where did I save that model?” moments.

The ModelEvaluation Class: Where the Magic Happens

This class is the MVP. It loads your test data and trained model, runs the predictions, calculates the metrics, and logs everything to MLflow (which Dagshub tracks for you).

class ModelEvaluation:
    def __init__(self, config: ModelEvaluationConfig):
        self.config = config

    def eval_metrics(self, actual, pred):
        rmse = np.sqrt(mean_squared_error(actual, pred))
        mae = mean_absolute_error(actual, pred)
        r2 = r2_score(actual, pred)
        return rmse, mae, r2

    def log_into_mlflow(self):
        test_data = pd.read_csv(self.config.test_data_path)
        model = joblib.load(self.config.model_path)

        test_x = test_data.drop([self.config.target_column], axis=1)
        test_y = test_data[[self.config.target_column]]

        # Load MLflow credentials from .env
        MLFLOW_TRACKING_URI = os.getenv("MLFLOW_TRACKING_URI")
        MLFLOW_TRACKING_USERNAME = os.getenv("MLFLOW_TRACKING_USERNAME")
        MLFLOW_TRACKING_PASSWORD = os.getenv("MLFLOW_TRACKING_PASSWORD")

        if not all([MLFLOW_TRACKING_URI, MLFLOW_TRACKING_USERNAME, MLFLOW_TRACKING_PASSWORD]):
            raise ValueError("Missing MLflow environment variables. Check your .env file.")

        mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)
        os.environ["MLFLOW_TRACKING_USERNAME"] = MLFLOW_TRACKING_USERNAME
        os.environ["MLFLOW_TRACKING_PASSWORD"] = MLFLOW_TRACKING_PASSWORD

        tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme

        with mlflow.start_run():
            predicted_qualities = model.predict(test_x)
            (rmse, mae, r2) = self.eval_metrics(test_y, predicted_qualities)

            scores = {"rmse": rmse, "mae": mae, "r2": r2}
            save_json(path=Path(self.config.metric_file_name), data=scores)

            mlflow.log_params(self.config.all_params)
            mlflow.log_metric("rmse", rmse)
            mlflow.log_metric("r2", r2)
            mlflow.log_metric("mae", mae)

            if tracking_url_type_store != "file":
                mlflow.sklearn.log_model(model, "model", registered_model_name="ElasticnetModel")
            else:
                mlflow.sklearn.log_model(model, "model")

        print("Model evaluation and logging to MLflow completed successfully.")

What’s Actually Happening?

  • Loads your test data and model: No more manual file picking.

  • Evaluates predictions: You get RMSE, MAE, and R²-so you know if your model is actually any good.

  • Saves metrics to a JSON file: Handy for quick checks or sharing results.

  • Logs everything to MLflow: This is where Dagshub comes in-now you can compare every experiment, see what worked, and flex your best models.

How to Run the Pipeline (No More “It Worked On My Laptop”)

Here’s how you set up the pipeline so anyone (even your future self) can run it:

pythonfrom src.Predict_Pipe.config.configuration import ConfigurationManager
from src.Predict_Pipe.components.model_evaluation import ModelEvaluation

class ModelEvaluationTrainingPipeline:
    def __init__(self):
        pass

    def initiate_model_evaluation(self):
        config = ConfigurationManager()
        model_evaluation_config = config.get_model_evaluation_config()
        model_evaluation = ModelEvaluation(config=model_evaluation_config)
        model_evaluation.log_into_mlflow()

And you kick it off in main.py like this:

pythonSTAGE_NAME = "Model Evaluation stage"
try:
    logger.info(f">>>>>> stage {STAGE_NAME} started <<<<<<")
    obj = ModelEvaluationTrainingPipeline()
    obj.initiate_model_evaluation()
    logger.info(f">>>>>>>{STAGE_NAME} completed <<<<<<\n\nx==========x")
except Exception as e:
    logger.exception(e)
    raise e
.

Results: What Do You Actually Get?

After running this, you’ll see:

  • A metrics.json file with your model’s RMSE, MAE, and R² scores. (Perfect for screenshots or sharing with your mentor!)

  • Logs in your terminal and log files, so you know what happened and where.

  • All your experiments, metrics, and models tracked on Dagshub’s MLflow dashboard. You can literally compare every run, see which parameters worked best, and download the best model whenever you want, here the link for my repo : https://dagshub.com/mainiyash2/Predict-pipe

    metrics.json:

{
    "rmse": 0.6379414257638729,
    "mae": 0.492435366850595,
    "r2": 0.32618194013718615
}

On Dagshub:
You get a dashboard with all your runs. It’s like a high score table for your models. You can see which model did best, what parameters you used, and even download the model straight from the browser. These are my results :


In next section we try to give it a outer look and try to write a simple flask app for it.

0
Subscribe to my newsletter

Read articles from Yash Maini directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Yash Maini
Yash Maini