🎯 "Predicting the Future: A Hands-On Guide to Linear Regression Using Python"


"Machine learning is not magic — it’s math made powerful."
— Tilak Savani
🧠 Introduction
Machine learning is transforming industries, from finance to healthcare. At the heart of it lies a simple but powerful tool — Linear Regression. Whether you're predicting housing prices or analyzing stock trends, linear regression is often your first step into predictive modeling.
In this post, we'll break it down step-by-step and implement it using Python’s most popular ML library — scikit-learn.
📈 What is Linear Regression?
Linear Regression is a supervised learning algorithm used for predicting continuous values. It finds the best-fitting straight line (called the regression line) through your data points.
The formula:
y = mx + c
Where:
y: predicted output
x: input features
m: slope (weights)
c: intercept (bias)
⚙️ How It Works (Step-by-Step)
1. Collecting the Data
We'll use a simple dataset: Hours Studied vs Exam Score
Hours Studied | Exam Score |
1.5 | 40 |
3.0 | 60 |
4.5 | 75 |
5.5 | 82 |
6.0 | 90 |
2. Visualizing the Data
Plotting helps us understand the trend — does more studying lead to better scores?
3. Building the Model
We'll use scikit-learn to define the linear regression model.
4. Training the Model
Feed data to the model so it learns the relationship between study time and scores.
5. Making Predictions
Once trained, we can predict scores based on study hours.
🧮 Step-by-Step Mathematical Derivation
We’ll now calculate the best-fit line manually using the least squares method.
Step 1: Calculate the means
x̄ = mean of x values
ȳ = mean of y values
From our data:
x̄ = (1.5 + 3 + 4.5 + 5.5 + 6) / 5 = 4.1
ȳ = (40 + 60 + 75 + 82 + 90) / 5 = 69.4
Step 2: Calculate slope (m)
m = Σ[(xᵢ -x̄)(yᵢ - ȳ)] / Σ[(xᵢ - x̄)²]
After calculation:
m ≈ 10.64
Step 3: Calculate intercept (c)
c = ȳ - m * x̄
c = 69.4 - 10.64 * 4.1 ≈ 25.78
✅ Final Linear Regression Equation
y = 10.64x + 25.78
So, every extra hour of study increases predicted score by ~10.64 points, and even with zero study hours, a student is expected to score ~25.78 marks.
🧪 Code Implementation
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
# Data
X = np.array([1.5, 3.0, 4.5, 5.5, 6.0]).reshape(-1, 1)
y = np.array([40, 60, 75, 82, 90])
# Train the model
model = LinearRegression()
model.fit(X, y)
# Predict
predicted = model.predict(X)
# Plot
plt.scatter(X, y, color='blue', label='Actual')
plt.plot(X, predicted, color='red', label='Regression Line')
plt.xlabel("Hours Studied")
plt.ylabel("Exam Score")
plt.title("Linear Regression Example")
plt.legend()
plt.show()
# Show equation
print(f"Equation: y = {model.coef_[0]:.2f}x + {model.intercept_:.2f}")
📊 Sample Output
Equation: y = 10.61x + 24.37
This means every extra hour studied increases score by approx 10.61 points!
🧠 Why Linear Regression Matters
✅ Simple yet powerful
✅ Great baseline model
✅ Easy to interpret
✅ Works well when data has a linear trend
🌐 Real-World Applications
Domain | Use Case |
Finance | Predicting stock prices |
Healthcare | Estimating disease progression |
Business | Sales forecasting |
Sports | Performance analytics |
🧩 Final Thoughts
Linear Regression is your first step into the exciting world of Machine Learning. It’s intuitive, useful, and forms the base for many advanced models. Mastering it gives you a solid foundation to explore techniques like polynomial regression, decision trees, and even deep learning.
✉️ Subscribe to my blog!
Stay tuned for more hands-on ML tutorials and projects. 🚀 Follow me on Hasenode and let's grow together in the world of Artificial Intelligence.
Subscribe to my newsletter
Read articles from Tilak Savani directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
