Is the Model making right predictions? - Part 2 of 5 on Evaluation of Machine Learning Models

Japkeerat SinghJapkeerat Singh
4 min read

We have already discussed accuracy as a metric, its limitations and confusion matrix in the previous post in the series. This post will cover the metrics that we can derive from confusion matrix and how they serve as a better alternative than looking at accuracy as a metric for classification problems.

There are a certain terminologies that each cell of this confusion matrix gets. To understand the terminology, we need to redefine Class A and Class B from the previous example to Positive and Negative. This would mean our matrix would now look something like this

PositiveNegative
Positive455
Negative1238

When the actual label is positive and the predicted one is positive as well, that scenario is called a True Positive (TP). Similarly, when the actual label is negative and the predicted one is negative as well, that scenario is called a True Negative (TN).

The matrix will now look like this

PositiveNegative
PositiveTrue Positive (TP)-
Negative-True Negative (TN)

When the prediction is negative but the actual output is positive, the scenario will be called a False Negative (FN) and similarly, when the actual is negative but prediction is positive, the scenario becomes a False Positive (FP).

PositiveNegative
PositiveTrue Positive (TP)False Negative (FN)
NegativeFalse Positive (FP)True Negative (TN)

Now, the False Positive is sometimes referred to as Type I Error and False Negative is referred to as Type II Error. Why Type I and Type II Errors? It will be discussed separately as it is a large topic of its own.

With terminologies completed, let’s derive a few metrics that we can use.

Precision

Precision as a metric explains how precise the model is when predicting a positive output. Meaning, whenever the machine learning model predicted a positive output, how many times was it indeed positive.

Mathematically, it can be written as

$$\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}$$

Simple, right?

Precision comes in handy during development of the machine learning models for which being correct when making a positive prediction is extremely important. One particular example comes in when building a machine learning model for rare disease identification.

Let’s say you build a model for which you get

  • True Positives = 50

  • False Positives = 150

  • True Negative = 9750

  • False Negative = 50

Going by these numbers, we get an accuracy of 98%. A really great number, isn’t it? But looking at the precision, we are only correct 25% times we say the person is positive for a certain disease.

Recall

Recall basically tells the ratio of how many positive examples the model can detect.

Mathematically, you can write it as

$$\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}$$

Looking at the same example of rare disease identification, we see that the model identifies only 50% of the positive samples. Meaning, 50% of the people that are actually ill will have a negative test report which means they will not be able to get treatment on time for the disease. We don’t want that to happen at all.


A little detour as this concept is going to come in great detail later in the series but a basic idea is required to understand the next metric.

When we develop a machine learning model, we usually perform hyperparameter tuning to identify the right set of hyperparameters which gives the best results. To do so in an automated fashion, we can make a simpler algorithm if we try to optimize for a single metric.


F1 Score

From the detour, you have gotten the gist of what this metric is. It combines both Precision & Recall to a single metric. It is highly useful for scenarios where both Precision & Recall need to be optimized.

F1 Score is a harmonic mean of both Precision & Recall. If you don’t know what harmonic mean is, it’s this formula:

$$\frac{2}{\text{F1 Score}} = \frac{1}{\text{Precision}} + \frac{1}{\text{Recall}}$$

For the same use case as above, if we put in precision & recall, we get an F1 score of 33.3%.

0
Subscribe to my newsletter

Read articles from Japkeerat Singh directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Japkeerat Singh
Japkeerat Singh

Hi, I am Japkeerat. I am working as a Machine Learning Engineer since January 2020, straight out of college. During this period, I've worked on extremely challenging projects - Security Vulnerability Detection using Graph Neural Networks, User Segmentation for better click through rate of notifications, and MLOps Infrastructure development for startups, to name a few. I keep my articles precise, maximum of 4 minutes of reading time. I'm currently actively writing 2 series - one for beginners in Machine Learning and another related to more advance concepts. The newsletter, if you subscribe to, will send 1 article every Thursday on the advance concepts.