Regression Loss Functions All Machine Learners Should Know.
When building a regression model, one of the most crucial decisions is choosing an appropriate loss function. This metric guides the optimization process during model training, helping the model learn the underlying patterns in the data. The loss function evaluates how well your model’s predictions match the actual values, and the goal is to minimize this error.
In this blog, we will dive into five essential regression loss functions that every machine learner should know. We’ll also discuss choosing the right loss function depending on your problem context.
1.Mean Squared Error (MSE)
Mean Squared Error is the most common loss function used in regression tasks. It calculates the average of the squared differences between predicted and actual values
$$\mathbf{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)²$$
$$y_i : Actual\ value$$
$$\hat{y}_i : Predicted\ value$$
$$n : Number\ of\ data\ points$$
pros | Cons |
Penalizes larger errors more severely, making it useful when large deviations are particularly undesirable | Sensitive to outliers because the error is squared. A few large errors can disproportionately influence the result |
Differentiable and convex, making it easy to optimize using gradient-based methods. |
When to use:MSE is a solid default choice when you want to penalize large errors heavily. It is especially useful when the errors are normally distributed.
2. Mean Absolute Error (MAE)
Mean Absolute Error is the average of the absolute differences between predicted and actual values:
$$\mathbf{MAE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)$$
Pros | Cons |
Robust to outliers, as it does not square the error terms. | Less sensitive to large deviations compared to MSE |
Provides a direct measure of average error in the same unit as the target variable. | The optimization landscape is non-smooth, which can make gradient-based optimization slower. |
When to use:MAE is suitable when you want a model less sensitive to outliers or when you want to interpret the error directly in terms of the actual value.
3. Huber Loss
Huber Loss combines the best aspects of MSE and MAE. It is defined as:
$$\mathbf{Huber Loss} = \begin{cases} \frac{1}{2}(y_i - \hat{y}_i)² & \text{for } |y_i - \hat{y}_i| \le \delta \\[4ex] \delta \cdot (|y_i - \hat{y}_i| - \frac{\delta}{2}) & \text{otherwise} \end{cases}$$
$$•\delta: A\ threshold\ that\ determines\ the\ boundary\ between\ MSE\ and\ MAE \ behavior.$$
pros | cons |
Smooth and differentiable, making it suitable for gradient-based optimization. | Requires tuning of the \delta parameter, which may be data-specific |
Less sensitive to outliers compared to MSE, especially when \delta is set appropriately |
When to use:Huber Loss is ideal when you want a balance between the robustness of MAE and the sensitivity of MSE, particularly in datasets with some outliers.
4. Log-Cosh Loss
Log-Cosh Loss is the logarithm of the hyperbolic cosine of the prediction error:
$$\text{Log-Cosh Loss} = \sum_{i=1}^{n} \log(\cosh(\hat{y}_i - y_i))$$
$$•\cosh(x) = \frac{e^x + e^{-x}}{2}: Hyperbolic cosine function.$$
pros | cons |
Approximates MSE for small errors and behaves similarly to MAE for large errors. | Interpretation of the loss value is less straightforward compared to MSE or MAE |
Smooth and differentiable, leading to stable and efficient optimization. |
When to use:Log-Cosh Loss is a good choice when you want a robust loss function that also provides smooth optimization. It is particularly effective when the dataset contains outliers but you don’t want to fully ignore their impact.
5. Quantile Loss
Quantile Loss is used to predict a specified quantile of the target variable, focusing on minimizing the asymmetric loss:
$$\text{Quantile Loss} = \begin{cases} \alpha \cdot (y_i - \hat{y}_i) & \text{if } y_i \ge \hat{y}_i \\[4ex] (1 - \alpha) \cdot (\hat{y}_i - y_i) & \text{otherwise} \end{cases}$$
$$• \alpha: The\ quantile\ to\ predict$$
pros | cons |
Allows modeling of conditional quantiles, useful for capturing uncertainty or variability in predictions. | Requires careful selection of the quantile parameter alpha. |
Provides more flexibility than symmetric loss functions like MSE or MAE. | May not provide a good fit if the data distribution is not aligned with the chosen quantile |
When to use:Quantile Loss is particularly useful when predicting ranges (e.g., predicting the median or upper quartile) rather than a single mean value, making it valuable in applications like financial forecasting or risk management.
Choosing the Right Loss Function for Your Model
Choosing the right loss function depends on the nature of your data and your specific goals:
• If your data contains outliers: Consider MAE, Huber Loss, or Log-Cosh Loss, as they are less sensitive to extreme values.
• If you want to penalize large errors more heavily: Use MSE, which amplifies the impact of large deviations.
• If you need to predict quantiles or capture uncertainty: Opt for Quantile Loss, which can provide valuable insights beyond average predictions.
Conclusion
The loss function you choose can significantly impact your model’s performance and the quality of your predictions. It’s crucial to understand the trade-offs each function brings and to experiment with different options to see which one aligns best with your data and goals.
By familiarizing yourself with these five regression loss functions, you can make more informed decisions when building machine learning models and enhance your ability to tackle a variety of regression problems.
Happy modeling! 🧠💻
Subscribe to my newsletter
Read articles from Aaron directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by