Part 3: Understanding Linear Regression for Accurate Predictions - The Line Function

Table of contents

In this part of the series, we will dive into one of the most foundational concepts in machine learning — Linear Regression.
We'll explore its core ideas and intuitively walk through the mathematics involved. Using only Python and NumPy, we'll implement the algorithm to gain a deeper understanding of what the regression line truly represents and how the model minimizes errors to enhance predictions.
Mathematics Behind Linear Regression
The core of the Linear Regression
is a statistical technique called Ordinary Least Squares (OLS). It finds the line of best fit by minimizing the sum of squared errors (differences between predicted and actual values).
$$OLS , Objective: \quad min_{m, b} \sum_{i=1}^{n} \left( y_i - (mx_i + b) \right)^2$$
This principle is directly adopted in supervised learning, where the model learns from input-output pairs by minimizing a loss function (error) — in this case, the squared loss.
What is Linear Regression?
Formal definition
Linear Regression is a supervised learning algorithm that models the relationship between a dependent variable and one or more independent variables by fitting a linear equation to the observed data.
$$Standard \, Linear \, Regression \, Equation: \quad y \, = \, mx \, + \, b$$
Layman’s Definition
Linear regression is just about drawing the best straight line through a bunch of points so we can make predictions.
For example, If someone studied 5 hours yesterday and scored 70, we might guess that studying 6 hours tomorrow could result in a slightly higher score.
Linear Regression helps us make that kind of educated guess — not randomly, but mathematically.
It uses past data to uncover trends and predict future outcomes in a logical, data-driven way.
Foundation of the Line Equation
At the heart of linear regression is a simple equation:
y = mx + b
y
= predicted value (target/output)x
= input value (feature)m
= slope of the lineb
= Intercept — the value ofy
whenx = 0
(where the line crosses the y-axis)
Understanding this equation is fundamental to Linear Regression.
Slope (m
) — The slope tells us how much the output y
changes when we increase the input x
by 1 unit.
Positive slope → output increases with input
Negative slope → output decreases with input
Zero slope → no relationship
Intercept (b
) — The Starting Point, the intercept is the value of y
when x = 0
. It tells us the baseline or default value when no input is present.
How Does the Model Learn the Slope and Intercept?
When we say a linear regression model “learns” the slope (m
) and intercept (b
), we mean it’s figuring out the values that define the best-fitting line through the data — the one that makes the most accurate predictions overall.
Why the Least Squares Method?
We use the Least Squares Method because it helps us find the most accurate line by minimizing the total error between predicted and actual values. Here’s how it works:
For each data point, the model calculates how far off the prediction is — this is called a residual.
It then squares each residual, so large errors count more.
Finally, it adds up all these squared errors and adjusts the line to make this total as small as possible.
$$\text{Loss} = \sum (y_i - \hat{y}_i)^2$$
In simple terms: it’s like saying, “Let’s draw the line that’s overall as close as possible to all the data points.”
Then we apply calculus to this "sum of squared residuals," and we derive a formula for slope m
that guarantees the loss is minimized:
$$m = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n} (x_i - \bar{x})^2}$$
This gives us the slope that minimizes total squared error — the best possible line through the data based on our chosen loss function.
A real world example
Let’s say our line is:
$$Score \, = \, m×Hours + b$$
Hours Studied (x ) | Predicted Score (y ) |
0 | 40 |
1 | 45 |
2 | 50 |
3 | 55 |
This line means:
Every extra hour of study adds
m
points to the predicted score.If someone study 0 hours, then the score start at
b
.
Both m
and b
are the parameters the model learns from training data. Once the model figures out the best values for these, it can predict new outcomes
import numpy as np
import matplotlib.pyplot as plt
def compute_linear_regression(x, y):
"""Calculate slope (m) and intercept (b) using the least squares method."""
x = np.array(x)
y = np.array(y)
n = len(x)
mean_x = np.mean(x)
mean_y = np.mean(y)
numerator = np.sum((x - mean_x) * (y - mean_y))
denominator = np.sum((x - mean_x) ** 2)
m = numerator / denominator
b = mean_y - m * mean_x
return m, b
def predict(x, m, b):
"""Generate predictions for given x using the regression line."""
return m * x + b
def print_regression_equation(m, b):
print(f"Equation of the line: y = {m:.2f}x + {b:.2f}")
def plot_regression_line(x, y, m, b):
"""Plot the data points and the regression line."""
x = np.array(x)
y = np.array(y)
y_pred = predict(x, m, b)
plt.figure(figsize=(6, 4))
plt.scatter(x, y, color='blue', label='Actual Data')
plt.plot(x, y_pred, color='red', linestyle='--', label='Regression Line')
plt.title('Linear Regression: Best Fit Line')
plt.xlabel('Hours Studied')
plt.ylabel('Score')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
def main():
# Input data
x = [0, 1, 2, 3]
y = [40, 45, 50, 55]
# Compute regression
m, b = compute_linear_regression(x, y)
print_regression_equation(m, b)
# Predict and plot
plot_regression_line(x, y, m, b)
if __name__ == "__main__":
main()
But how does it determine what's best?
To evaluate how well a linear regression model is performing, we need to understand a few core concepts that connect the model's predictions to real-world effectiveness. Here's a breakdown of those essential concepts:
Actual vs Predicted Values: The model generates predictions (
ŷ
), which are then compared to the true values (y
). The closeness of these predictions to the actual values serves as a measure of performance.Residuals: These are the differences between the actual values and the predicted values. Residuals indicate how far off each prediction is, with smaller residuals signifying more accurate predictions.
Loss/Error Metrics:
MSE (Mean Squared Error): This metric calculates the average of the squared differences between predicted and actual values, placing more emphasis on larger errors.
RMSE (Root Mean Squared Error): By taking the square root of MSE, RMSE provides a measure in the same unit as the target variable, making it easier to interpret.
MAE (Mean Absolute Error): This optional but useful metric calculates the average of the absolute differences between actual and predicted values and is less influenced by outliers.
R² Score (Coefficient of Determination): This score indicates how well the model fits the actual data. A value closer to 1 suggests that the model makes more accurate predictions.
Visual Inspection - Seeing Beyond the Metrics: While metrics like MSE or R² provide numerical evaluations, they may not capture the full picture. Visual inspection can uncover patterns and issues that metrics might miss.
Scatter Plot: Assesses how well the regression line fits the data.
Residual Plot: A random scatter suggests a good fit, while patterns indicate potential problems.
Predicted vs Actual Plot: Points close to the line indicate better predictions.
Overfitting and Underfitting: These concepts describe how well the model captures patterns in the data. A model that is too simple may miss patterns (underfitting), while a model that is too complex may memorize noise (overfitting).
Summary
In this article, we explored Linear Regression, a basic concept in machine learning from both intuitive and mathematical perspectives. We focused on the Ordinary Least Squares method, which minimizes error to create the best-fit line. The discussion covered key components like slope, intercept, and model learning. We also wrote Python code to implement linear regression using NumPy and visualized model performance through various error metrics and plots.
Understanding Linear Regression is a powerful first step toward mastering supervised learning. It shows you how models learn from data — not just to memorize, but to generalize and make smart predictions.
What’s Next?
In the upcoming articles of this series, we will explore how to evaluate the performance of a linear regression model. We briefly introduced these concepts at the end of this article, but we will delve into them in more detail in the following articles.
The next article explores the
Subscribe to my newsletter
Read articles from Abhilash PS directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
