Welcome to the final part of our journey.

After we built our Machine Learning models, I guess we still have some questions:

How do we handle the bias-variance tradeoff when building a model and checking its performance?
How do we pick the best values for the hyperparameters (the parameters that are not learned)?
How do we choose the best Machine Learning model for my business problem?

In this blog, I will answer these questions using Model Selection techniques, including:

k-Fold Cross Validation
Grid Search

We will end this course with a bonus section about one of the most powerful and popular Machine Learning models: XGBoost.

Model Selection

Till now we have partitioned our dataset into two parts. Now, let's discuss k-fold cross-validation. There are two main approaches:

Some believe k-fold cross-validation alone is enough, without a test set.
Others use k-fold cross-validation on the training set and still use the test set later.

In this tutorial, we'll follow the second approach, as it's more general. We'll simplify it for the first approach at the end.

You train your model on the training set and you evaluate its performance by the test set as the machine has not seen the test set. But what if you just get lucky on this test set? What if it does well on the test set but not on future data? That's where k-fold cross-validation helps.

We'll take the training set and split it into 10 folds. It's called k-folds, but for simplicity, we'll assume K equals 10.

Then we're going to train the data on nine of these folds and keep one fold as an unseen fold for validation.

Now we'll shift the validation fold. The validation fold becomes this fold.

And again, it won't be seen during training. So we'll get a new trained model and validate on this fold.

Note that we do this for each fold or combination of folds.

We must use the same hyperparameters each time.

Now, we're repeatedly training the model on slightly different data and validating it on the changing validation fold.

So, we train on all this data and then validate on this fold (which changes each time), unseen during training.

And as a result, we will have 10 sets of metrics.

We'll have 10 sets of metrics to assess and aggregate. If these aggregated metrics look good, then the modeling approach is valid. This means the model and hyperparameters chosen are suitable for this data.

And then, we'll train the model on all the training data and test it on the test set as usual. That's our final step.

If the aggregate metrics aren't good, we need to adjust the model's hyperparameters or change the model entirely.

Bias Variance Tradeoff

Let's understand Bias and Variance.

Let's see what we've learned so far:
We split our set into a training set, which is then divided into 10 folds. We train the model on nine folds and test it on the remaining fold. Then, we repeat this process for each fold.

if we plot this on the bias-variance curve,this is what happens when you have high bias, low variance.

This is what you want to predict, the target, but your model's predictions are far from the target yet clustered together. This means the model is too simple and misses the data's trend.

Alternatively, with low bias and high variance, the average of all models hits the target.

Every time we slightly change the training data, the model's results vary.

Both scenarios are undesirable.

You might also have high bias and high variance, where predictions are both off-target and scattered.

And this is probably the worst of the three, where the model is too simple to capture the data's trend and too sensitive, capturing noise.

Here's the ideal scenario, like a unicorn, where your data is clustered together with low variance and in the right spot. So the average is what we want to predict, meaning it has low bias.

And this is a great model; it captures data trends accurately and generalizes well.

However, such models are rare, like a unicorn.

We often encounter high bias, low variance or low bias, high variance, and strive for low bias, low variance.

k-Fold Cross Validation in Python

Find and get your code here.

Let's code this down with a CSV file

Running this will give us the average of 10 accuracy scores, which is 90%. This means the average accuracy is 90%, providing a good measure of our model's performance.

We also get a 6% standard deviation. What does this mean?

It means the average difference between the various accuracy scores and the average accuracy of 90% is 6%.

This isn't too much variance, which is good. It means that most of the time, our model's performance will be between 84% and 96%.

So, this puts us in the low bias and low variance category.

There's another method called Grid Search, helps us choose the best hyperparameters for any ml model we create.

Find and get your code here.

We have taken 2 sets of values,

We will check which values give us better accuracy. These values are based on our experience, and you'll learn about them as you go. We achieve a best accuracy of 90.67%, which is slightly better than 90.33%. In future projects, you'll see that even a small improvement in accuracy can make a difference. The best combination of parameters for this accuracy includes a regularization parameter C of 0.5, which means we needed to lower this parameter to reduce overfitting. We also got the best accuracy with an RBF kernel, and the best gamma value for this kernel is 0.6.

XGBoost

From the dataset, we detect if a breast cancer tumor is benign or malignant. Each row represents a patient with features like clump thickness and cell uniformity. A class of two (2) means benign, and four (4) means malignant.

Logistic regression and k-Nearest Neighbors both achieved 94.7% accuracy. SVM had 94.1%, while kernel SVM improved to 95.3%. Naive Bayes also had 94.1%, but the decision tree was the best with 95.9%. Random forests lagged with 93.5% accuracy. [Note: I've used these models on this dataset in earlier blogs]

And so, what I wanna do now, as you probably have guessed, is to build the XGBoost model and train it on the same dataset to see if it can take the throne hold by the decision tree classification model. In other words, to see if it can beat that accuracy obtained with the decision tree classification model.

and we get an impressive accuracy of 97.8% (In the confusion matrix). So, it's far better than the decision tree. So yes, XGBoost is number one here.

Now you can confidently begin your machine learning journey. I wish you great success in your future projects. May you provide strong insights and valuable analysis to your team and clients. With your powerful toolkit, you're ready to handle future challenges. I wish you the best and hope to see you in another course. Until then, enjoy machine learning.

Goodbye!

ML chapter 10: Model Selection & Boosting

Table of contents

Model Selection

Bias Variance Tradeoff

k-Fold Cross Validation in Python

XGBoost

Subscribe to my newsletter

Fatima Jannet

Fatima Jannet