Mean Absolute Error (MAE): The "No Drama" Loss Function


Today, we're diving into Mean Absolute Error (MAE) - the straightforward, no-nonsense member of the loss function family.
The Problem with MBE (A Quick Recap)
Remember how MBE tells us if our model is an optimist or pessimist? Well, that's useful, but it has a weakness - positive and negative errors can cancel each other out. Your model could be wildly wrong, but if it's equally wrong in both directions, MBE might give you a deceptively small value close to zero.
It's like saying your basketball shooting average is great because all your shots that went too far left were balanced out by shots that went too far right. Sure, the average direction might be center, but you're still missing the basket!
When building regression models, it’s essential to have reliable metrics that tell you how well your model’s predictions match the actual outcomes. One of the most widely used loss functions for this purpose is Mean Absolute Error (MAE). MAE is simple, interpretable, and gives you a clear picture of the average magnitude of errors in your predictions, without considering their direction.
What is Mean Absolute Error (MAE)?
MAE doesn't care which direction you're wrong - it just wants to know how wrong you are, period. It's like that friend who doesn't sugar-coat things: "I don't care if you're overshooting or undershooting - you're still off target!"
In technical terms, MAE measures the average absolute difference between predicted values and actual values. By taking the absolute value, we ensure that negative and positive errors don't cancel each other out.
The Math (Still Pretty Simple!)
Mathematically, MAE is defined as:
$$MAE = \frac{1}{N} \sum_{i=1}^{N} |y_i - \hat{y}_i|$$
Note that this looks awfully close to the formula for MBE, well because it is. The difference is the (|…|) in the formula, the pipelines. Those pipelines mean taking the absolute value.
Where:
yi is the actual value
hat y is the predicted value
N is the number of observations
The vertical bars |...| represent the absolute value
In plain English: Find the difference between each predicted and actual value, make all these differences positive (absolute value), add them up, and divide by how many you have. Straightforward, right?
MAE is also known as L1 Loss in some circles, which sounds fancy but is just another name for our straightforward friend here.
When Would You Actually Use MAE?
MAE has several appealing qualities that make it useful in many scenarios:
It's Interpretable: The error is in the same units as your original data. If you're predicting house prices in dollars, your MAE is in dollars too.
Robust to Outliers: Unlike some other loss functions we'll encounter later (looking at you, MSE aka Mean Squared Error), MAE doesn't give extra weight to large errors, making it less sensitive to outliers.
Mathematically Straightforward: Simple to understand and implement.
MAE is particularly useful when you want a clear, easy-to-explain measure of accuracy, your dataset might contain outliers you don't want to overemphasize, and you want to treat all sizes of errors with equal importance.
However, MAE does have a quirk - it's not differentiable at zero (that absolute value creates a sharp point on the graph). This can make optimization a bit trickier for some algorithms, but in practice, it's rarely a significant issue.
MAE in Action with Python
Let’s see how we calculate MAE in Python:
import numpy as np
# Actual values
y_true = np.array([3, -0.5, 2, 7])
# Predicted values
y_pred = np.array([2.5, 0.0, 2, 8])
# Calculate Mean Absolute Error
mae = np.mean(np.abs(y_true - y_pred))
print(f"Mean Absolute Error (MAE): {mae}")
# Output: Mean Absolute Error (MAE): 0.5
With the same data we used in our MBE example, we get an MAE of 0.5. This tells us that, on average, our predictions are off by 0.5 units, regardless of direction. MBE for the same data as -0.25.
The MBE value of -0.25 indicates a slight tendency to over-predict - since the value is negative, the predictions are on average slightly higher than the actual values).
The MAE value of 0.5 tells us that, on average, our predictions are off by 0.5 units in either direction.
MBE thought there was an error variation in one direction, whereas MAE tells us that there is an error in both directions. This example also shows us that using both of these measurement together is valuable.
Implementing MAE in ML.NET
In ML.NET, you can calculate MAE using a similar approach as we did with MBE, but this time, we’ll focus on the absolute differences.
using System;
using System.Linq;
class Program
{
static void Main()
{
double[] yTrue = { 3.0, -0.5, 2.0, 7.0 };
double[] yPred = { 2.5, 0.0, 2.0, 8.0 };
// Calculate Mean Absolute Error
double mae = yTrue.Zip(yPred, (actual, predicted) => Math.Abs(actual - predicted)).Average();
Console.WriteLine($"Mean Absolute Error (MAE): {mae}");
// Output: Mean Absolute Error (MAE): 0.5
}
}
Unlike MBE, ML.NET actually has MAE built in as a metric, but I'm showing the calculation explicitly so you can understand what's happening under the hood.
Implementing MAE in Azure AI
When working with Azure Machine Learning, tracking MAE is straightforward and, unlike MBE, it's a standard metric that's often built into the platform:
from azureml.core import Run
import numpy as np
# Simulated actual and predicted values
y_true = np.array([3, -0.5, 2, 7])
y_pred = np.array([2.5, 0.0, 2, 8])
# Calculate Mean Absolute Error
mae = np.mean(np.abs(y_true - y_pred))
# Log the metric to Azure ML
run = Run.get_context()
run.log("Mean Absolute Error", mae)
The beauty of MAE is that it's widely recognized and implemented in most ML frameworks, so you'll often find built-in functions to calculate it.
Real-World Scenario: House Price Prediction
Let’s see this function in practice.
Imagine you're building a model to predict house prices in your city. After training, you calculate an MAE of $25,000. This tells you that, on average, your predictions are off by $25,000 in either direction.
This insight is valuable because:
It's directly interpretable in dollars
It gives you a clear benchmark for improvement
It helps set expectations for stakeholders - "Our model can predict house prices with an average error of $25,000"
Now, is that good or bad? Well, it depends entirely on your context. If you're predicting million-dollar mansions, being off by $25,000 is pretty good! If you're predicting $100,000 starter homes, that's a 25% error - well, not so impressive.
This highlights another important aspect of MAE - it's an absolute measure, not a relative one. Sometimes you might want to consider the error relative to the magnitude of what you're predicting, but that's a story for another loss function (hint: MAPE (Mean Absolute Percentage Error) or MSLE (Mean Squared Logarithmic Error)).
MAE vs. MBE: The Key Differences
To solidify your understanding, let's contrast MAE with MBE in this comparison table:
Feature | Mean Bias Error (MBE) | Mean Absolute Error (MAE) |
What it measures | Directional bias (consistently over or under-predicting) | Average error magnitude regardless of direction |
Can be zero when | Positive and negative errors cancel each other out (model could still be inaccurate) | Only when all predictions are perfect |
Sign | Can be negative (under-prediction) or positive (over-prediction) | Always positive (or zero for perfect predictions) |
Formula | $\frac{1}{N} \sum_{i=1}^{N} | |
Example value | -0.25 (slight tendency to over-predict) | 0.5 (average error of 0.5 units) |
Sensitivity to outliers | Can be heavily influenced by outliers | Less sensitive than squared errors, but still affected |
Primary use | Detecting systematic bias in predictions | Measuring overall prediction accuracy |
Think of it this way: MBE tells you if your model needs a directional adjustment, while MAE tells you how much your model is off overall. Using both together gives you a more complete picture of your model's performance.
Wrapping Up
Mean Absolute Error is straightforward, interpretable, and resilient to outliers. Its simplicity makes it a popular choice for many regression problems, especially when you need a metric that's easy to explain to non-technical stakeholders.
While MAE doesn't tell you about directional bias like MBE does, it gives you a clearer picture of overall prediction accuracy. And unlike some fancier loss functions we'll encounter later, what you see is what you get with MAE - no hidden surprises or complex behaviors. It’s like a simple but honest friend.
When evaluating a regression model, it’s worth considering using both MBE and MAE together - one to check for bias, and one to measure overall accuracy. They're simple enough that calculating both barely adds any complexity, but together they tell a much more complete story about your model's performance.
For example, let’s go back to our house pricing example (this seems to be the example for Machine Learning tutorials by the way).
imagine two different house price prediction models:
Model A: MBE = -$5,000, MAE = $25,000
Model B: MBE = $500, MAE = $40,000
In this example above, you'd think Model B is better because it has less bias (only $500 vs -$5,000), if you were to use just MBE. However, the MAE shows Model A's predictions are generally closer to the actual prices, being off by $25,000 on average compared to Model B's $40,000.
What's happening? Model A tends to slightly over-predict (negative MBE), but its predictions stay fairly close to the actual values. Model B has almost no bias because its errors cancel out - it makes big over-predictions and big under-predictions that average to almost zero, hiding how inaccurate it really is. Using both metrics together gives you the complete picture: how biased your model is and how accurate it is overall. Neither metric alone tells the whole story.
We need to keep in mind that neither function may give you enough information to make a decision. For example, consider two more house price prediction models:
Model C: MBE = $1,000, MAE = $30,000
Model D: MBE = $1,000, MAE = $30,000
These look identical based on MBE and MAE, but what if Model C makes consistent errors across all price ranges, while Model D is very accurate for average-priced homes but wildly off for luxury properties? Or what if Model C makes many moderate-sized errors, while Model D is usually spot-on but occasionally makes massive mistakes?
This is why we might need additional metrics like: - Mean Squared Error (MSE) to penalize larger errors more heavily - Mean Absolute Percentage Error (MAPE) to understand errors relative to the price - Quantile-based metrics to see how errors are distributed.
At this point, I won’t blame you if you are thinking along the lines of, “What the… do I need to run every loss function there is under the sun to get the complete picture?”.
That is tedious and it is not practical.
Fortunately, no - you don't need to calculate every possible loss function for every model. The key is understanding which metrics reveal the aspects of performance that matter most for your specific application, hence this series.
The goal isn't to use every metric, but rather to choose metrics that align with what "good performance" means in your specific context. Are you more concerned with avoiding large errors? Maintaining consistent performance across different scales? Ensuring no systematic bias? The answers guide which 2-3 metrics you should focus on.
Remember, choosing the right loss function is about understanding your specific problem and what aspect of model performance matters most to you. Sometimes the simplest option is exactly what you need!
I will put this flowchart on the index of the series too, but here is a simple flowchart to follow regarding loss functions:
Subscribe to my newsletter
Read articles from TJ Gokken directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

TJ Gokken
TJ Gokken
TJ Gokken is an Enterprise AI/ML Integration Engineer with a passion for bridging the gap between technology and practical application. Specializing in .NET frameworks and machine learning, TJ helps software teams operationalize AI to drive innovation and efficiency. With over two decades of experience in programming and technology integration, he is a trusted advisor and thought leader in the AI community