When it comes to evaluating regression-based machine learning models, picking the right metric is like picking the right seasoning for your dish—the wrong choice could leave a bitter taste. We have already previously covered the metrics which you can use for evaluation of classification based machine learning models. In this article, and the following one, we will solely focus on metrics for regression based machine learning models.

Metrics for Regression Models: The Basics

Unlike classification models, regression models deal with predicting continuous values. This means we’re less concerned with thresholds and more focused on how far off the predictions are from the actual values.

Ideally, if we just want to see how far off is the prediction, we can just make a difference of actual and predicted values and call it a day. But with Machine Learning, there are no ideal scenarios. Extrapolating this idea to thousands and millions of data points, and you will sometimes get an average error rate of 0. But does that mean your model is performing way too well? Nope. It could be that the model is wrong on both the directions of the actual value equally in terms of cumulative magnitude in each direction.

For instance, imagine a model predicting daily temperatures where the actual temperature is 30°C. If the model predicts 40°C on one day and 20°C on another, the average error might come out to 0, but it’s clearly far from accurate! This highlights the importance of using more robust metrics to assess the performance of regression models.

Keeping this in mind, let's focus how we can carefully craft the metrics that work well for regression models.

Mean Absolute Error

The Mean Absolute Error (MAE) is one of the simplest yet highly interpretable metrics for evaluating regression models. MAE calculates the average of the absolute differences between the actual and predicted values, making it easy to understand and less sensitive to outliers compared to some other metrics.

Formula

The formula for MAE is as follows:

$$\text{MAE} = \frac{1}{n} \sum_{i=1}^n \lvert y_i - \hat{y}_i \rvert$$

where:

n is the total number of observations.
y represents the true value.
y^ represents the predicted value.

In simple terms, it is basically finding the difference, removing the sign (discarding the direction of error), sums all the errors, and averages the final value to get the mean error of the model.

Mean Squared Error

MAE brings an important approach towards calculation of the error - discarding the direction of the error. MSE does exactly the same thing, with a slight change. Instead of calculating the mean, it calculates the square of the difference. This way, we again remove the direction and only emphasis is on the magnitude of the error.

Formula

The formula of MSE is

$$\text{MSE} = \frac{1}{n} \sum_{i=1}^n \left( y_i - \hat{y}_i \right)^2$$

Why MSE when MAE exists?

The squaring of the error makes the algorithm sensitive to the magnitude of the error. It amplifies large errors such that even a small portion of large errors will have profound impact on the MSE value.
Due to sensitive nature of MSE, it is sensitive to outliers which isn’t the case with MAE.
Mean Absolute Error minimizes when the predictions favor the median value of the testing set while MSE minimizes when the predictions are closer to the average of the testing set.

R-Squared (R²) Metric

Now, let’s talk about a metric that is widely recognized and frequently used in the evaluation of regression models: R-squared (R²). While Mean Absolute Error (MAE) and Mean Squared Error (MSE) are good at measuring the magnitude of errors, R-squared takes a different approach. It gives us a measure of how well the regression model fits the data, helping us understand how much of the variance in the target variable is explained by the model.

What Does R-Squared Measure?

R², also called the coefficient of determination, is essentially a percentage that tells us how well the regression model explains the variability of the target variable. A high R² indicates that the model captures most of the variance, whereas a low R² suggests that the model is missing the mark and not explaining much of the variability in the data.

To understand this better, let’s break it down. The idea behind R² is based on comparing two things:

The total sum of squares (TSS): This measures how much the actual values vary from the mean of the target variable.
The residual sum of squares (RSS): This measures how much the predicted values deviate from the actual values.

R² is then calculated using the formula:

$$R^2 = 1 - \frac{RSS}{TSS}$$

Where:

RSS (Residual Sum of Squares) is the sum of the squared differences between the observed values and the predicted values. It is mathematically written as

$$RSS = \sum_{i=1}^n (y_i - \hat{y_i})^2$$

TSS (Total Sum of Squares) is the sum of the squared differences between the observed values and the mean of the observed values. It is mathematically written as

$$TSS = \sum_{i=1}^n (y_i - \bar{y_i})^2$$

Interpretation of R-Squared

R² = 1: The model perfectly fits the data. All data points fall exactly on the regression line.
R² = 0: The model doesn’t explain any of the variance in the data. Essentially, it’s no better than predicting the mean of the target variable for all instances.
R² < 0: This happens when the model performs worse than a simple horizontal line at the mean of the target variable. This indicates that the model is a poor fit, and in some cases, even overfitting or underfitting the data.

The Pros and Cons of R-Squared

Pros:

Easy to Interpret: R² is easy to explain to stakeholders because it represents a percentage of the variance explained by the model.
Model Comparison: It helps in comparing different models for the same dataset. A higher R² value typically indicates a better model fit.

Cons:

Not Robust to Overfitting: A higher R² doesn’t always mean a better model. It can sometimes mislead when dealing with overfitting, as a model might perfectly fit the training data but fail to generalize well to unseen data.
Doesn't Handle Non-linearity Well: R² assumes a linear relationship between the independent variables and the dependent variable. It may not be appropriate for complex models that don't exhibit a linear pattern.
Insensitive to Changes in Data: Since R² is based on the sum of squares, small changes in the data can sometimes cause large variations in the R² value, especially when the data is noisy.

Adjusted R-Squared: A Better Alternative?

While R² is useful, it has a significant drawback: it always increases when you add more variables to the model, even if those variables aren’t actually contributing useful information. This means that R² can give you an inflated sense of the model's quality when you're working with multiple features.

To address this issue, Adjusted R-squared comes into play. Adjusted R² adjusts the statistic based on the number of predictors in the model. It is particularly useful when comparing models with different numbers of features.

The formula for Adjusted R² is:

$$Adj R^2 = 1 - (\frac{(1-R^2)(n-1)}{n-p-1})$$

Where:

n is the number of data points.
p is the number of independent variables (predictors).

When to Use R-Squared?

R² is most effective when you're dealing with linear regression models and are trying to evaluate the model's ability to capture the relationship between input features and the target variable. It can also be useful when comparing models on the same dataset. However, it's important to remember that a high R² doesn’t always indicate a good model, especially if the model is overfitting the data or fails to generalize well to new data.

In summary, R² is a valuable metric in regression analysis, but it should always be considered alongside other metrics like MAE and MSE, as well as techniques like cross-validation, to get a fuller picture of model performance.

Is the Model making right predictions? - Part 4 of 5 on Evaluation of Machine Learning Models

Metrics for Regression Models: The Basics

Mean Absolute Error

Formula

Mean Squared Error

Formula

Why MSE when MAE exists?

R-Squared (R²) Metric

What Does R-Squared Measure?

Interpretation of R-Squared

The Pros and Cons of R-Squared

Pros:

Cons:

Adjusted R-Squared: A Better Alternative?

When to Use R-Squared?

Subscribe to my newsletter

Japkeerat Singh

Japkeerat Singh