πŸ“˜ Blog Title: Mastering Hyperparameter Tuning in Machine Learning

Tilak SavaniTilak Savani
5 min read


🧠 Introduction

In machine learning, achieving high model accuracy isn't just about choosing the right algorithmβ€”tuning hyperparameters can make a huge difference. Hyperparameters are settings that govern the learning process and directly affect model performance.


❓ Why Hyperparameter Tuning is Important

Hyperparameter tuning helps:

  • Boost accuracy

  • Avoid underfitting/overfitting

  • Improve generalization

  • Speed up convergence

Incorrect hyperparameters can ruin even the best data or algorithms.


βš™οΈ Common Hyperparameters in ML Models

AlgorithmCommon Hyperparameters
Random Forestn_estimators, max_depth, min_samples_split
SVMC, kernel, gamma
XGBoostlearning_rate, n_estimators, max_depth, subsample
KNNn_neighbors, weights, metric
Neural Netslearning_rate, batch_size, epochs, activation

πŸ” Hyperparameter Tuning Techniques

There are several strategies to tune hyperparameters. Each has its trade-offs in terms of accuracy, time, and complexity.

πŸ“Œ What is it?

Tries all possible combinations of specified hyperparameter values.

πŸ“¦ Example:

If n_estimators = [50, 100] and max_depth = [3, 5], Grid Search tries:

  • (50, 3)

  • (50, 5)

  • (100, 3)

  • (100, 5)

βœ… Pros:

  • Exhaustive, can find the optimal combo

❌ Cons:

  • Computationally expensive

  • Not scalable to large hyperparameter spaces

πŸ“Œ What is it?

Samples random combinations from a defined space.

πŸ“¦ Example:

From the same grid above, it might try only 2 or 3 random combinations instead of all 4.

βœ… Pros:

  • More efficient than grid search

  • Often finds good results faster

❌ Cons:

  • May miss optimal settings

3️⃣ Bayesian Optimization

πŸ“Œ What is it?

Uses probabilistic models (like Gaussian Processes) to model the function and decide where to search next.

βœ… Pros:

  • Smarter and faster than random/grid search

  • Fewer evaluations needed

❌ Cons:

  • Complex to implement

  • Slower per iteration due to model buildin

4️⃣ Genetic Algorithms (GA)

πŸ“Œ What is it?

Inspired by evolution. Uses selection, crossover, and mutation to evolve better hyperparameter combinations.

βœ… Pros:

  • Handles complex search spaces

  • Works well with large models

❌ Cons:

  • Requires tuning GA parameters

  • Computationally heavy

5️⃣ Hyperband

πŸ“Œ What is it?

A resource-aware algorithm that allocates more resources to better-performing configurations and eliminates poor ones early.

βœ… Pros:

  • Extremely efficient

  • Good for deep learning models

❌ Cons:

  • May need early-stopping criteria

  • Implementation more complex than Grid/Random Search


πŸ› οΈ Sklearn Example with GridSearchCV & RandomizedSearchCV

Let’s walk through practical examples using Scikit-learn. We’ll tune hyperparameters for a Random Forest Classifier.

βœ… Step-by-Step: GridSearchCV

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Load data
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define the model
model = RandomForestClassifier()

# Define hyperparameter grid
param_grid = {
    'n_estimators': [50, 100],
    'max_depth': [3, 5, None],
    'min_samples_split': [2, 5]
}

# Setup GridSearchCV
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=5, scoring='accuracy')

# Fit
grid_search.fit(X_train, y_train)

# Best Parameters & Accuracy
print("Best Parameters:", grid_search.best_params_)
y_pred = grid_search.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

🎲 Step-by-Step: RandomizedSearchCV

from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint

# Define parameter distribution
param_dist = {
    'n_estimators': randint(50, 200),
    'max_depth': [3, None],
    'min_samples_split': randint(2, 10)
}

# Setup RandomizedSearchCV
random_search = RandomizedSearchCV(estimator=model, param_distributions=param_dist,
                                   n_iter=10, cv=5, scoring='accuracy', random_state=42)

# Fit
random_search.fit(X_train, y_train)

# Best Parameters & Accuracy
print("Best Parameters:", random_search.best_params_)
y_pred = random_search.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

πŸ”§ Hyperparameter Tuning in XGBoost & LightGBM

🐼 XGBoost Hyperparameter Tuning Example

import xgboost as xgb
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# Load data
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define the model
model = xgb.XGBClassifier(use_label_encoder=False, eval_metric='logloss')

# Define hyperparameter grid
param_grid = {
    'n_estimators': [100, 200],
    'max_depth': [3, 5, 7],
    'learning_rate': [0.01, 0.1, 0.2],
    'subsample': [0.8, 1.0]
}

# GridSearchCV
grid = GridSearchCV(model, param_grid, cv=3, scoring='accuracy')
grid.fit(X_train, y_train)

print("Best Params:", grid.best_params_)
print("Best Score:", grid.best_score_)

🌟 LightGBM Hyperparameter Tuning Example

import lightgbm as lgb
from sklearn.model_selection import RandomizedSearchCV

# Define the model
model = lgb.LGBMClassifier()

# Define hyperparameter space
param_dist = {
    'num_leaves': [31, 50, 100],
    'learning_rate': [0.01, 0.05, 0.1],
    'n_estimators': [100, 200, 500],
    'max_depth': [-1, 10, 20],
    'min_child_samples': [10, 20, 30]
}

# RandomizedSearchCV
random_search = RandomizedSearchCV(model, param_distributions=param_dist,
                                   n_iter=10, cv=3, scoring='accuracy', random_state=42)
random_search.fit(X_train, y_train)

print("Best Params:", random_search.best_params_)
print("Best Score:", random_search.best_score_)

πŸ’‘ Best Practices for Tuning

  • Always start with a baseline model to compare improvements.

  • Use RandomizedSearchCV for large search spaces; it's faster.

  • Try cross-validation (cv=5 or cv=10) for stable results.

  • Use domain knowledge to narrow down ranges.

  • Tune in stages: e.g., first n_estimators, then max_depth, then learning_rate.


πŸ“Š Comparison Table: Techniques vs Time vs Accuracy

TechniqueTime EfficiencyAccuracy BoostBest For
Grid Search❌ Slowβœ… HighSmall, specific ranges
Random Searchβœ… Fasterβœ… GoodLarge search space
Bayesian Optimizationβœ…βœ… Very Fastβœ…βœ… ExcellentComplex models
Genetic Algorithmβœ… Mediumβœ… HighExploration of complex spaces
Hyperbandβœ…βœ… Very Fastβœ… GoodLarge models, early stopping

βœ… Pros

  • Boosts model performance.

  • Helps discover optimal configurations.

  • Works for any ML algorithm.


⚠️ Cons

  • Can be computationally expensive.

  • Risk of overfitting if not validated properly.

  • Some techniques (e.g., Bayesian) require advanced libraries.


🧩 Final Thoughts

Hyperparameter tuning is essential to extract the best performance from your models. While it's easy to overlook, it can often be the difference between a mediocre and a top-performing model. Start simple, validate thoroughly, and scale tuning as your models grow more complex.


πŸ“¬ Subscribe

If you found this guide helpful, follow for more content on Machine Learning, Deep Learning, and Model Optimization!

Thanks for Reading πŸ˜€.

0
Subscribe to my newsletter

Read articles from Tilak Savani directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Tilak Savani
Tilak Savani