Part 2: Regression in Machine Learning: Predicting the Future with Data


Previously in Part 1, we introduced supervised learning — teaching machines with examples. Now, let’s dive deeper into regression, the method behind predicting things like prices, temperatures, or growth. In this post, we’ll learn what regression is, when to use it, and why it matters.
Intro: what is Regression?
Regression is a type of supervised learning used to predict a continuous numeric value based on input features. The goal is to learn a mathematical relationship—or mapping—between the inputs (also called independent variables) and the output (or dependent variable).
Example Problem it Solves:
Predicting the price of a house based on its size, location, and number of bedrooms.
In this case, the model learns from past data to understand how each feature influences the house price. Once trained, it can predict the price of a new house given its features.
We know the inputs (like size and location), and the model helps us predict the output (price), even for examples it hasn't seen before.
Why Regression Matters?
Regression is essential because it helps us make informed predictions about the future based on past data. It’s widely used in real-world scenarios where the outcome is a numeric value.
Weather Forecasting – Predicting tomorrow’s temperature based on historical weather data.
Sales Forecasting – Estimating next quarter’s revenue using trends and marketing spend.
Stock Market Analysis – Forecasting future stock prices using past price movements and indicators.
Fuel Efficiency Estimation – Predicting how many kilometers a car can travel per litre based on engine specs and weight.
In short, regression gives machines the ability to forecast, estimate, and plan — making it a foundational tool in many industries like finance, healthcare, retail, and transportation.
Types of Regression
Linear Regression
Here, we try to draw a straight line through the data points to best show the relationship between the input and the output.
It works better when the change in output is steady or consistent as the input increases or decreases — this is called a linear relationship.
Multiple Linear Regression
This kind of regression uses more than one input feature to make predictions. For example, both size and number of bedrooms can be used to predict house price.
Even though it still creates a straight-line relationship, the line exists in multiple dimensions — one for each input feature.
Polynomial Regression
This one is used when the relationship between the input and output isn’t a straight line — meaning it curves or bends.
Instead of fitting a straight line, it fits a curved line (a polynomial function) to the data to better capture complex patterns.
Key Concepts to Understand
Understanding the foundational ideas will help us to grasp any type of regression model — whether it’s linear, multiple, or polynomial.
1. Features and Targets
Features (Inputs / Independent Variables): These are the variables that we provide to the model to help it make predictions.
Examples: Size of the house, number of bedrooms, location score.
Targets (Outputs / Dependent Variables): The value that we want the model to predict, which depends on the features.
Example: Price of the house.
2. Residual/Error
A residual is the difference between the actual value and the predicted value. It tells us how far off our prediction was for a given data point.
$$\text{Residual} = y_{\text{actual}} - y_{\text{predicted}}$$
3. Loss (Mean Squared Error - MSE)
MSE is a common metric used to measure how well a regression model is performing.
$$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$
It calculates the average of the squared differences between the actual values and the predicted values (these differences are called residuals).
Squaring the residuals ensures all errors are positive and penalizes larger errors more heavily.
A lower MSE means the model's predictions are closer to the actual values, indicating better performance.
4. Prediction from input features
The model’s goal is to learn a function that accurately maps the input features to the target output. It improves over time by comparing its predictions to the actual values in the training data and minimizing the loss function (like Mean Squared Error). This process helps the model adjust its internal parameters to make better predictions as it learns from more data.
Summary
Regression is a supervised learning technique used to predict continuous numeric values from input features by mapping inputs to outputs. It's crucial for making informed predictions in various fields like finance, healthcare, and retail. Key types include linear, multiple linear, and polynomial regression, each suited to different relationships between input and output. Understanding features, targets, residuals, and Mean Squared Error (MSE) is essential, as these concepts help in evaluating model performance and improving predictions over time.
What’s Next
In the next part of this series, we’ll take a closer look at Linear Regression — one of the most foundational techniques in machine learning. We’ll learn how it works, why it’s effective for predicting continuous values, and how to implement it step-by-step using Python and NumPy. We’ll break down the core math, visualize how the model fits a straight line through data, and understand how it minimizes errors to improve its predictions over time.
Subscribe to my newsletter
Read articles from Abhilash PS directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
