08 Model Training

Absolutely! Here’s your Model Trainer documentation, formatted and explained in the exact style and structure you provided for Data Transformation.
Overview
The Model Trainer stage is responsible for training a machine learning model (ElasticNet) using the processed training and testing datasets. This is a key step in building a predictive pipeline, allowing us to create a model that can learn from data and make accurate predictions.
Configuration
The pipeline uses a YAML configuration file to specify paths for storing model artifacts and locating the training and testing datasets.
# config\config.yaml
model_trainer:
root_dir: artifacts/model_trainer
train_data_path: artifacts/data_transformation/train.csv
test_data_path: artifacts/data_transformation/test.csv
model_name: model.joblib
root_dir: Directory where the trained model will be saved.
train_data_path: Path to the training data CSV file.
test_data_path: Path to the testing data CSV file.
model_name: Name of the saved model file.
Entity Definition
We use a data class to define the configuration entity for model training. This ensures type safety and easy access to configuration parameters.
# src/Predict_Pipe/entity/config_entity.py
from dataclasses import dataclass
from pathlib import Path
@dataclass
class ModelTrainerConfig:
root_dir: Path
train_data_path: Path
test_data_path: Path
model_name: str
alpha: float
l1_ratio: float
target_column: str
Configuration Manager
The configuration manager reads the YAML file, creates the necessary directories for storing model artifacts, and prepares the configuration for the training component.
# src/Predict_Pipe/config/configuration.py
def get_model_trainer_config(self) -> ModelTrainerConfig:
config = self.config.model_trainer
params = self.params.ElasticNet
schema = self.schema.TARGET_COLUMN
create_directories([config.root_dir])
model_trainer_config = ModelTrainerConfig(
root_dir=Path(config.root_dir),
train_data_path=Path(config.train_data_path),
test_data_path=Path(config.test_data_path),
model_name=config.model_name,
alpha=params.alpha,
l1_ratio=params.l1_ratio,
target_column=schema
)
return model_trainer_config
Ensures the
root_dir
exists before proceeding.Returns a
ModelTrainerConfig
instance for use in the pipeline.
Model Trainer Component
This component handles the actual training of the ElasticNet model.
# src/Predict_Pipe/components/model_trainer.py
import pandas as pd
import os
from src.Predict_Pipe.logging import logger
from sklearn.linear_model import ElasticNet
import joblib
from src.Predict_Pipe.entity.config_entity import ModelTrainerConfig
class ModelTrainer:
def __init__(self, config: ModelTrainerConfig):
self.config = config
def train(self):
train_data = pd.read_csv(self.config.train_data_path)
test_data = pd.read_csv(self.config.test_data_path)
train_x = train_data.drop([self.config.target_column], axis=1)
test_x = test_data.drop([self.config.target_column], axis=1)
train_y = train_data[[self.config.target_column]]
test_y = test_data[[self.config.target_column]]
lr = ElasticNet(alpha=self.config.alpha, l1_ratio=self.config.l1_ratio, random_state=42)
lr.fit(train_x, train_y)
joblib.dump(lr, os.path.join(self.config.root_dir, self.config.model_name))
logger.info("Model trainer completed")
Reads the training and testing datasets from the specified paths.
Splits the data into features and target columns.
Trains an ElasticNet model using the provided hyperparameters.
Saves the trained model as a
.joblib
file in the specified directory.Logs the completion of the training process.
Pipeline Orchestration
The pipeline ensures that model training is executed in a controlled manner.
# src/Predict_Pipe/pipeline/model_trainer.py
from src.Predict_Pipe.config.configuration import ConfigurationManager
from src.Predict_Pipe.components.model_trainer import ModelTrainer
from src.Predict_Pipe.logging import logger
STAGE_NAME = "Model Trainer stage"
class ModelTrainerTrainingPipeline:
def __init__(self):
pass
def initiate_model_training(self):
config = ConfigurationManager()
model_trainer_config = config.get_model_trainer_config()
model_trainer = ModelTrainer(config=model_trainer_config)
model_trainer.train()
Retrieves the model trainer configuration.
Initializes the
ModelTrainer
component.Runs the training process.
Main Execution
The main script runs the model trainer pipeline and logs the progress.
# main.py
STAGE_NAME = "Model Trainer stage"
try:
logger.info(f">>>>>> stage {STAGE_NAME} started <<<<<<")
obj = ModelTrainerTrainingPipeline()
obj.initiate_model_training()
logger.info(f">>>>>>>{STAGE_NAME} completed <<<<<<\n\nx==========x")
except Exception as e:
logger.exception(e)
raise e
Starts the model trainer stage.
Logs the start and completion of the stage.
Handles and logs any exceptions.
Artifacts and Logs
Artifacts: The trained model (
model.joblib
) is saved in theartifacts/model_trainer
directory.Logs: All logs are stored in the
logs
directory for easy debugging and tracking.
In the next section, we’ll use the trained model to evaluate and predict.
Subscribe to my newsletter
Read articles from Yash Maini directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
