We have already discussed accuracy as a metric, its limitations and confusion matrix in the previous post in the series. This post will cover the metrics that we can derive from confusion matrix and how they serve as a better alternative than looking at accuracy as a metric for classification problems.

There are a certain terminologies that each cell of this confusion matrix gets. To understand the terminology, we need to redefine Class A and Class B from the previous example to Positive and Negative. This would mean our matrix would now look something like this

	Positive	Negative
Positive	45	5
Negative	12	38

When the actual label is positive and the predicted one is positive as well, that scenario is called a True Positive (TP). Similarly, when the actual label is negative and the predicted one is negative as well, that scenario is called a True Negative (TN).

The matrix will now look like this

	Positive	Negative
Positive	True Positive (TP)	-
Negative	-	True Negative (TN)

When the prediction is negative but the actual output is positive, the scenario will be called a False Negative (FN) and similarly, when the actual is negative but prediction is positive, the scenario becomes a False Positive (FP).

	Positive	Negative
Positive	True Positive (TP)	False Negative (FN)
Negative	False Positive (FP)	True Negative (TN)

Now, the False Positive is sometimes referred to as Type I Error and False Negative is referred to as Type II Error. Why Type I and Type II Errors? It will be discussed separately as it is a large topic of its own.

With terminologies completed, let’s derive a few metrics that we can use.

Precision

Precision as a metric explains how precise the model is when predicting a positive output. Meaning, whenever the machine learning model predicted a positive output, how many times was it indeed positive.

Mathematically, it can be written as

$$\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}$$

Simple, right?

Precision comes in handy during development of the machine learning models for which being correct when making a positive prediction is extremely important. One particular example comes in when building a machine learning model for rare disease identification.

Let’s say you build a model for which you get

True Positives = 50
False Positives = 150
True Negative = 9750
False Negative = 50

Going by these numbers, we get an accuracy of 98%. A really great number, isn’t it? But looking at the precision, we are only correct 25% times we say the person is positive for a certain disease.

Recall

Recall basically tells the ratio of how many positive examples the model can detect.

Mathematically, you can write it as

$$\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}$$

Looking at the same example of rare disease identification, we see that the model identifies only 50% of the positive samples. Meaning, 50% of the people that are actually ill will have a negative test report which means they will not be able to get treatment on time for the disease. We don’t want that to happen at all.

A little detour as this concept is going to come in great detail later in the series but a basic idea is required to understand the next metric.

When we develop a machine learning model, we usually perform hyperparameter tuning to identify the right set of hyperparameters which gives the best results. To do so in an automated fashion, we can make a simpler algorithm if we try to optimize for a single metric.

F1 Score

From the detour, you have gotten the gist of what this metric is. It combines both Precision & Recall to a single metric. It is highly useful for scenarios where both Precision & Recall need to be optimized.

F1 Score is a harmonic mean of both Precision & Recall. If you don’t know what harmonic mean is, it’s this formula:

$$\frac{2}{\text{F1 Score}} = \frac{1}{\text{Precision}} + \frac{1}{\text{Recall}}$$

For the same use case as above, if we put in precision & recall, we get an F1 score of 33.3%.

Is the Model making right predictions? - Part 2 of 5 on Evaluation of Machine Learning Models

Precision

Recall

F1 Score

Subscribe to my newsletter

Japkeerat Singh

Japkeerat Singh