Part 4: Linear Regression: Key Techniques for Better Model Performance

Abhilash PSAbhilash PS
18 min read

Once we’ve built a linear regression model, the next big question is:

“How good is this line at making predictions?”

It’s not just about drawing a line — it’s about understanding how well the model captures real-world patterns. Are predictions close to reality? Are there consistent errors? Can we trust this model for future decisions?

Let’s break this down step by step — using a simple example of predicting exam scores from hours studied.

Example Scenario: Predicting Exam Scores from Study Hours

Imagine this: We’re trying to predict exam scores based on hours studied, and we have collected data from a few friends:

Hours StudiedScore
045
150
255
365
470

We plot the points and draw a straight line, which is our linear regression model, and then use it to predict scores for new students.

  • But how do we know if the line is actually good?

  • It might look okay, but are the predictions close to the real scores?

  • Are the errors small and random, or is our model consistently off in some way?

This is where we start checking the model by comparing actual and predicted values, looking at the differences (residuals), and using performance measures to see how reliable your model really is.

Step 1: Comparing Actual vs Predicted Values

Let’s say our model predicts using this equation:


  ŷ = 5x + 40

Here’s what it looks like:

Hours Studied (x)Actual Score (y)Predicted Score (ŷ)Residual (y - ŷ)
042402
147452
253503
358553
467607

This table shows how the predicted values stack up against the actual ones — the first quick check when evaluating our model. If the predictions are close to the real results, that's a good sign. But if there are big or repeated gaps, it could mean the model is missing some key patterns in the data.

Step 2: From Residuals to Error Metrics

Once we calculate residuals (errors between actual and predicted values), we can summarize overall model performance using a few key metrics.

Let’s use this example dataset:

StudentActual Score (y)Predicted Score (ŷ)Residual (y - ŷ)Residual²y - ŷ
15052-242
26058242
370664164

In an ideal world, residuals would all be zero, meaning the model predicted every value perfectly. But real-world models aren’t perfect. These residuals tell us how far off each prediction is:

  • A small residual means the model did well on that point.

  • A large residual shows a bigger error — the model missed the mark.

If these residuals seem randomly scattered around zero, the model is probably performing well overall. But if there's a pattern — like all residuals being positive, or increasing/decreasing — it may indicate that our model is missing something, such as a nonlinear trend.

MSE (Mean Squared Error)

MSE (Mean Squared Error) is one of the most popular metrics for evaluating how well a regression model performs. It measures the average of the squared differences between the actual values and the predicted values — also known as residuals.

$$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$

  • yᵢ: actual value

  • ŷᵢ: predicted value

  • n: number of data points

Why Do We Square the Residuals?

  1. To avoid cancellation: Residuals can be positive or negative. If we simply averaged them, errors could cancel each other out — giving a misleading sense of accuracy. Squaring ensures all errors are positive.

  2. To penalize big mistakes: Squaring amplifies larger errors. An error of 4 becomes 16, while 1 becomes just 1. This way, MSE gives more weight to bigger mistakes, making it useful when large errors are especially costly — like in finance or healthcare predictions.

Example with Student Scores

Let’s say we built a model to predict student scores based on hours studied. Here’s the data:

StudentActual Score (yᵢ)Predicted Score (ŷᵢ)Residual (yᵢ − ŷᵢ)Residual²
15052-24
2605824
37066416

To calculate MSE, we take the average of the squared residuals:

$$\text{MSE} = \frac{4 + 4 + 16}{3} = \frac{24}{3} = 8$$

So, the Mean Squared Error is 8, meaning that, on average, the square of the model’s prediction errors is 8.

When to use MSE: Use MSE when we want our model to punish large errors more. This is especially useful in scenarios where one big mistake can outweigh several small ones — like predicting blood pressure, loan default risks, or business revenue forecasts.

RMSE (Root Mean Squared Error)

RMSE is simply the square root of the Mean Squared Error (MSE). It tells us, on average, how far our model's predictions are from the actual values — in the same units as our target variable.

So, while MSE gives you squared errors, RMSE brings it back to the original scale — making it much easier to interpret.

$$\text{RMSE} = \sqrt{ \frac{1}{n} \sum_{i=1}^{n} (y_{i} - \hat{y}_{i})^2 }$$

This formula tells us:

  • yᵢ: actual value

  • ŷᵢ: predicted value

  • n: number of data points

Example: Student Scores

Let’s say our model is trying to predict student scores based on hours studied. We’ve got this small dataset:

StudentActual Score (y)Predicted Score (ŷ)Residual (y - ŷ)Residual²
15052-24
2605824
37066416

Now, we calculate the Mean Squared Error (MSE):

$$\text{MSE} = \frac{4 + 4 + 16}{3} = \frac{24}{3} = 8$$

Then, we take the square root:

$$\text{RMSE} = \sqrt{8} \approx 2.83$$

So, our model is off by about 2.83 marks on average. That’s much easier to understand and communicate than saying “the average squared error is 8.”

When to use RMSE: RMSE is like a “friendlier” version of MSE — it still penalizes large errors more than small ones (since it’s built on squaring), but it returns the error in real-world units.

Use RMSE when:

  • We want to compare models in a way that reflects real-world scale.

  • We care about highlighting large errors more than small ones.

  • We want to explain your model's accuracy to someone without diving into math-heavy details.

MAE (Mean Absolute Error)

MAE calculates the average of the absolute differences between the actual values and the predicted values. In plain terms:

“How far off is my model, on average?”

No squaring. No root-taking. Just the raw gap between reality and prediction, measured fairly and clearly.

$$\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} \left| y_{i} - \hat{y}_{i} \right|$$

  • yᵢ: actual value

  • ŷᵢ: predicted value

  • n: number of data points

  • ∣⋅∣: absolute value

Let’s Revisit Our Student Score Example

| Student | Actual Score (y) | Predicted Score (ŷ) | Residual (y - ŷ) | |y - ŷ| | | --- | --- | --- | --- | --- | | 1 | 50 | 52 | -2 | 2 | | 2 | 60 | 58 | 2 | 2 | | 3 | 70 | 66 | 4 | 4 |

Now, let’s calculate the MAE:

$$\text{MAE} = \frac{2 + 2 + 4}{3} = \frac{8}{3} \approx 2.67$$

So, our model is off by about 2.67 marks on average.

When to use MAE: If we’re looking for a quick, clear, and honest measure of error, MAE is our go-to. It gives us the raw truth — how much our model is off, on average, in the most human-readable way.

MAE vs. MSE/RMSE

Here's a visual comparison of the three main error metrics — MAE, MSE, and RMSE — based on our sample data:

  • MAE (2.67): Average size of the errors, treats all mistakes equally.

  • MSE (8): Penalizes larger errors more due to squaring.

  • RMSE (2.83): Similar to MSE but easier to interpret since it's in the same unit as the target variable.

MetricFocusPenalizes Large Errors More?UnitsInterpretability
MAEAverage absolute error❌ NoSame as output✅ Very intuitive
MSEAverage squared error✅ YesSquared units❌ Less intuitive
RMSESquare root of MSE✅ YesSame as output✅ Fairly intuitive

The R² Score — How well does our line fit?

Imagine we're using our model to predict student scores based on hours studied. Some predictions will be spot-on, others slightly off. But how do we know — overall — if the model is really doing a good job?

That’s where the R² Score, or coefficient of determination, comes in. It tells us how much of the variation in the actual outcomes (like exam scores) can be explained by our model’s predictions.

$$R^2 = 1 - \frac{\sum_{i=1}^{n} (y_{i} - \hat{y}{i})^2}{\sum{i=1}^{n} (y_{i} - \bar{y})^2}$$

  • yᵢ: actual value

  • ŷᵢ: predicted value

  • ȳ: is the mean of actual values

  • n: number of data points

  • R²: proportion of variance explained by the model

Let’s Use Our Example

We earlier trained a linear regression model to predict student scores based on hours studied, and it used this equation:

ŷ = 5x + 40

For R², a score of 1 means the model predicts everything perfectly, while a score of 0 means it does no better than just guessing the average score for everyone, regardless of hours studied.

Here’s the dataset we used:

Hours Studied (x)Actual Score (y)Predicted Score (ŷ)Residual (y - ŷ)
042402
147452
253503
358553
467607

Step 1: Compute ȳ​ (mean of actual scores):

$$\bar{y} = \frac{42 + 47 + 53 + 58 + 67}{5} = 53.4$$

Step 2: Calculate the squared errors (numerator):

$$\sum (y_i - \hat{y}_i)^2 = 2^2 + 2^2 + 3^2 + 3^2 + 7^2 = 4 + 4 + 9 + 9 + 49 = 75$$

Step 3: Calculate the total variance from the mean (denominator):

$$\sum (y_i - \bar{y})^2 = (42 - 53.4)^2 + (47 - 53.4)^2 + \dots + (67 - 53.4)^2 = 129.2$$

Step 4: Plug into the R² formula:

$$R^2 = 1 - \frac{75}{129.2} \approx 0.42$$

An R² of 0.42 means the model explains 42% of the variation in student scores. The rest — 58% — might be due to other factors like exam stress, sleep quality, or guesswork.

It’s not a bad model, but it also suggests room for improvement. Maybe the relationship between study hours and scores isn’t perfectly linear, or we’re missing another variable like study quality.

Why do we need R² when we have MAE, MSE and RMSE?

Metrics like MAE, MSE, and RMSE tell us how far off the model’s predictions are from the actual values — they measure the accuracy or size of the errors. But there's one thing they don’t tell us:

Is the model actually capturing the underlying pattern in the data?

That’s where R² (R-squared) comes in. It adds another layer of understanding — showing how well the model explains the variation in the data, not just how close its guesses are.

MetricFocusGood For
MAE / MSE / RMSEError sizeMeasuring prediction accuracy
Explanatory powerJudging fit and comparing models

For example, if R² is 0.85, that means 85% of the variation in exam scores is explained by how many hours were studied. It tells us the model understands the trend — not just makes close guesses.

While error metrics answer “How wrong is the model?”, R² answers “Is the model learning something useful?”. That’s why it’s especially helpful when comparing models — a higher R² usually means a model is better at capturing relationships in the data.

In practice, we look at R² alongside MAE, MSE, RMSE, and visual plots. Together, they help paint a complete picture of how accurate and how insightful the model really is.

The Power of Visualization — Plots that Reveal the Truth

While metrics like MSE, RMSE, MAE, and R² give us valuable numerical insights into model performance, visualizations can uncover patterns those numbers might miss. Think of them as our model’s X-ray — revealing where it performs well, where it stumbles, and whether it’s even solving the right problem.

1. Actual vs Predicted Plot
This is a scatter plot where each point compares the model’s prediction (ŷ) to the actual outcome (y). If the model were perfect, all points would lie exactly on the 45° diagonal line. Deviations from this line show where predictions fall short. This plot gives an immediate, intuitive grasp of the model's overall accuracy.

  • Each point shows the actual value vs the predicted one.

  • The closer the points are to the 45° line, the better our predictions.

2. Residual Plot

Here, we plot the residuals (y - ŷ) on the y-axis against either the predicted values or the independent variable (x) on the x-axis. A good model will show residuals scattered randomly around the horizontal line at 0. If you see curves, patterns, or clusters, it may signal that the model is missing a nonlinear trend, or that certain ranges of x values are consistently over- or under-predicted.

  • Plot residuals (errors) against predicted values.

  • If the residuals look like a random cloud, the model is good.

  • If we see a pattern (like a curve or funnel), our model is likely missing something.

3. Histogram of Residuals
This plot helps check the distribution of errors. Ideally, residuals should form a bell-shaped curve centered around zero — suggesting that errors are normally distributed. Skewed or multi-peaked distributions could point to bias or model misfit.

  • Helps us check if errors are evenly spread and mostly small.

  • A bell-shaped (normal) distribution is a good sign.

4. Q-Q Plot (Quantile-Quantile)
For more statistically-minded users, this plot checks whether residuals follow a normal distribution by comparing quantiles. It’s often used to validate assumptions in linear regression, especially when we rely on inference.

  • If the residuals fall neatly along the straight diagonal line, it means they are normally distributed — which is ideal for linear regression.

  • If the points curve away from the line, it suggests non-normality, possibly indicating outliers or issues with model assumptions.

What do our regression model needs to work well?

So far, we’ve checked how well our model performs using residuals, error metrics, and visual tools. But even if everything looks good, it doesn’t always mean the model is reliable for real-world use.

Why? Because linear regression depends on a few key assumptions. If these are not met, the model might still fit the data — but the results, like R² or predictions, could be misleading.

Let’s look at the four important assumptions that every linear regression model needs to follow.

1. Linearity — The relationship should be a straight line

Linear regression assumes that the outcome (like marks) changes in a straight-line pattern with the input (like study hours). If the real relationship is curved and we fit a straight line, the model will miss important trends.

How to check: Look at the residual plot. A random scatter is good. But if the points form a curve, it means the model is forcing a straight line where it doesn’t belong.

2. Independence of Errors — Predictions Shouldn’t Be Connected

The errors (residuals) from one prediction shouldn’t influence another. Each data point and its error must stand alone. If they’re linked — like in time-based data — the model’s results might not be trustworthy.

How to check: Plot the residuals in sequence (like by time). If you notice a pattern or trend, the errors may not be independent. The Durbin-Watson test is another tool that helps check this.

3. Homoscedasticity — Equal Error Spread

The model assumes that the size of the errors stays roughly the same across all input values. If the model is accurate for some inputs but way off for others, this assumption is broken.

How to check: Look at the residual vs. predicted plot. The spread of residuals should be even. If you see a funnel shape — where errors grow wider or narrower — it’s a sign of heteroscedasticity (unequal error spread).

4. Normality of Residuals — Errors Should Follow a Bell Curve

To make reliable predictions and use statistical tests (like confidence intervals), the model’s errors should follow a normal (bell-shaped) distribution.

How to check: Check the histogram of residuals — it should look bell-shaped. A Q-Q Plot should show points close to a straight line. Big deviations can signal problems, often caused by outliers or skewed data.

Why All This Matters

Even if our model looks accurate, breaking these rules can lead to misleading results — especially in real-world decisions or forecasts. These checks helps us go beyond building models… to trusting them.

Wrapping It All Up — Making Sense of Model Performance

"""
linear_regression_module.py

A minimal, educational implementation of simple linear regression using NumPy and Matplotlib.

Includes:
- Computation of slope and intercept using least squares
- Prediction using the regression line
- Evaluation metrics: MSE, RMSE, R²
- Visualization of the regression line and residuals

Author: Abhilash PS
"""

import numpy as np
import matplotlib.pyplot as plt


# -----------------------------
# Core Regression Calculations
# -----------------------------

def compute_regression_coefficients(x, y):
    """
    Computes the slope and intercept using the least squares method.
    Returns:
        m (float): slope
        b (float): y-intercept
    """
    x = np.array(x)
    y = np.array(y)
    x_mean = np.mean(x)
    y_mean = np.mean(y)

    numerator = np.dot(x - x_mean, y - y_mean)
    denominator = np.dot(x - x_mean, x - x_mean)

    m = numerator / denominator
    b = y_mean - m * x_mean
    return m, b


def predict(x, m, b):
    """
    Predicts target values using the regression equation y = mx + b.
    """
    x = np.array(x)
    return m * x + b


# -----------------------------
# Evaluation Metrics
# -----------------------------

def calculate_mse(y_true, y_pred):
    """
    Calculates Mean Squared Error (MSE) between true and predicted values.
    """
    y_true = np.array(y_true)
    y_pred = np.array(y_pred)
    return np.mean((y_true - y_pred) ** 2)


def calculate_rmse(y_true, y_pred):
    """
    Calculates Root Mean Squared Error (RMSE).
    """
    return np.sqrt(calculate_mse(y_true, y_pred))


def calculate_r2_score(y_true, y_pred):
    """
    Calculates R² (coefficient of determination).
    """
    ss_res = np.sum((y_true - y_pred) ** 2)
    ss_tot = np.sum((y_true - np.mean(y_true)) ** 2)
    return 1 - ss_res / ss_tot


# -----------------------------
# Visualization
# -----------------------------

def plot_regression_with_residuals(x, y_true, y_pred, m, b, title="Linear Regression Fit and Residuals"):
    """
    Plots the data points, regression line, and residuals.
    """
    x = np.array(x)
    y_true = np.array(y_true)
    y_pred = np.array(y_pred)

    plt.figure(figsize=(8, 5))
    plt.scatter(x, y_true, color='blue', label='Actual Data')
    plt.plot(x, y_pred, color='red', label=f'Prediction: y = {m:.2f}x + {b:.2f}')

    # Plot residual lines (dotted)
    for xi, yi, yp in zip(x, y_true, y_pred):
        plt.plot([xi, xi], [yi, yp], color='gray', linestyle='dotted')

    plt.xlabel('Feature (x)')
    plt.ylabel('Target (y)')
    plt.title(title)
    plt.legend()
    plt.grid(True)
    plt.tight_layout()
    plt.show()


# -----------------------------
# Main Pipeline
# -----------------------------

def run_pipeline(x, y):
    """
    Executes the full regression pipeline:
    - Computes coefficients
    - Makes predictions
    - Evaluates performance
    - Displays results and plots
    """
    m, b = compute_regression_coefficients(x, y)
    y_pred = predict(x, m, b)

    mse = calculate_mse(y, y_pred)
    rmse = calculate_rmse(y, y_pred)
    r2 = calculate_r2_score(y, y_pred)

    print(f"Regression Equation: y = {m:.2f}x + {b:.2f}")
    print(f"MSE: {mse:.3f}, RMSE: {rmse:.3f}, R²: {r2:.3f}\n")

    plot_regression_with_residuals(x, y, y_pred, m, b)


# -----------------------------
# Demo (Example Usage)
# -----------------------------

if __name__ == "__main__":
    # Example dataset
    x = [1, 2, 3, 4, 5]
    y = [50, 55, 65, 70, 77]

    run_pipeline(x, y)

By now, we’ve gone from understanding how linear regression models make predictions to knowing how to evaluate those predictions in a meaningful way.

We began with a simple comparison of actual vs. predicted values — the first sanity check. Then we looked at residuals to spot where the model misses the mark. Along the way, we explored key metrics like MAE, MSE, RMSE, and to assess performance from different angles — how accurate the predictions are, how much large errors matter, and how well the model captures the underlying trend.

We also touched on something easy to miss but super important: assumptions. Linear regression isn’t just about drawing a straight line — it works best when a few things are true behind the scenes. The relationship should be linear, errors should be independent and evenly spread, and residuals should follow a normal distribution. If these aren't met, even a model with “good” metrics can mislead us.

Finally, we turned to visual tools like residual plots, histograms, and Q-Q plots — because sometimes what we see reveals what numbers can’t. These plots offer a clear, intuitive sense of how your model behaves — and whether it's meeting those assumptions.

Together, these techniques give us a complete evaluation toolkit. No single metric or chart tells the full story, but when used together, they help you decide whether our model is solid, needs fixing, or isn’t quite ready for the real world.

What’s Next?

Now that we know how to evaluate our model’s performance, it’s time to ask a deeper question:
Is our model learning just right — or not enough — or maybe… too much?

In the next part, we’ll explore the two biggest traps in machine learning: underfitting and overfitting.
We’ll learn how to spot them, why they happen, and what we can do to fix or avoid them — with simple visuals and real-world examples.

Stay tuned!

0
Subscribe to my newsletter

Read articles from Abhilash PS directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Abhilash PS
Abhilash PS