Is the Model making right predictions? - Part 2 of 5 on Evaluation of Machine Learning Models


We have already discussed accuracy as a metric, its limitations and confusion matrix in the previous post in the series. This post will cover the metrics that we can derive from confusion matrix and how they serve as a better alternative than looking at accuracy as a metric for classification problems.
There are a certain terminologies that each cell of this confusion matrix gets. To understand the terminology, we need to redefine Class A and Class B from the previous example to Positive and Negative. This would mean our matrix would now look something like this
Positive | Negative | |
Positive | 45 | 5 |
Negative | 12 | 38 |
When the actual label is positive and the predicted one is positive as well, that scenario is called a True Positive (TP). Similarly, when the actual label is negative and the predicted one is negative as well, that scenario is called a True Negative (TN).
The matrix will now look like this
Positive | Negative | |
Positive | True Positive (TP) | - |
Negative | - | True Negative (TN) |
When the prediction is negative but the actual output is positive, the scenario will be called a False Negative (FN) and similarly, when the actual is negative but prediction is positive, the scenario becomes a False Positive (FP).
Positive | Negative | |
Positive | True Positive (TP) | False Negative (FN) |
Negative | False Positive (FP) | True Negative (TN) |
Now, the False Positive is sometimes referred to as Type I Error and False Negative is referred to as Type II Error. Why Type I and Type II Errors? It will be discussed separately as it is a large topic of its own.
With terminologies completed, let’s derive a few metrics that we can use.
Precision
Precision as a metric explains how precise the model is when predicting a positive output. Meaning, whenever the machine learning model predicted a positive output, how many times was it indeed positive.
Mathematically, it can be written as
$$\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}$$
Simple, right?
Precision comes in handy during development of the machine learning models for which being correct when making a positive prediction is extremely important. One particular example comes in when building a machine learning model for rare disease identification.
Let’s say you build a model for which you get
True Positives = 50
False Positives = 150
True Negative = 9750
False Negative = 50
Going by these numbers, we get an accuracy of 98%. A really great number, isn’t it? But looking at the precision, we are only correct 25% times we say the person is positive for a certain disease.
Recall
Recall basically tells the ratio of how many positive examples the model can detect.
Mathematically, you can write it as
$$\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}$$
Looking at the same example of rare disease identification, we see that the model identifies only 50% of the positive samples. Meaning, 50% of the people that are actually ill will have a negative test report which means they will not be able to get treatment on time for the disease. We don’t want that to happen at all.
A little detour as this concept is going to come in great detail later in the series but a basic idea is required to understand the next metric.
When we develop a machine learning model, we usually perform hyperparameter tuning to identify the right set of hyperparameters which gives the best results. To do so in an automated fashion, we can make a simpler algorithm if we try to optimize for a single metric.
F1 Score
From the detour, you have gotten the gist of what this metric is. It combines both Precision & Recall to a single metric. It is highly useful for scenarios where both Precision & Recall need to be optimized.
F1 Score is a harmonic mean of both Precision & Recall. If you don’t know what harmonic mean is, it’s this formula:
$$\frac{2}{\text{F1 Score}} = \frac{1}{\text{Precision}} + \frac{1}{\text{Recall}}$$
For the same use case as above, if we put in precision & recall, we get an F1 score of 33.3%.
Subscribe to my newsletter
Read articles from Japkeerat Singh directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Japkeerat Singh
Japkeerat Singh
Hi, I am Japkeerat. I am working as a Machine Learning Engineer since January 2020, straight out of college. During this period, I've worked on extremely challenging projects - Security Vulnerability Detection using Graph Neural Networks, User Segmentation for better click through rate of notifications, and MLOps Infrastructure development for startups, to name a few. I keep my articles precise, maximum of 4 minutes of reading time. I'm currently actively writing 2 series - one for beginners in Machine Learning and another related to more advance concepts. The newsletter, if you subscribe to, will send 1 article every Thursday on the advance concepts.