Why I Started Machine Learning – A Deep Dive into Day 1 & 2 Learnings

Table of contents
- 👋 Introduction
- 🧠 What is Machine Learning?
- 🧩 Categories of Machine Learning
- 🎯 Supervised Learning in Detail
- 📈 Day 2: Deep Dive into Linear Regression
- 🔢 Univariate Linear Regression
- 🔢 Multivariate Linear Regression
- Practical Implementation
- Part 1: Load and Explore the Data
- Part 2: Preprocess the data
- Part 3: Univariate linear regression
- Part 4: Plot Prediction
- Multiple linear regression
👋 Introduction
Hello there!
I'm Ashish Sharma, a software engineer who’s worked across tech stacks — from backend APIs to frontend UI and DevOps pipelines. But there’s one area I’ve long been curious about, one that powers everything from ChatGPT to fraud detection systems - Machine Learning (ML).
Recently, I decided to stop postponing and formally start my ML journey. I aim to deeply understand concepts, mathematical foundations, and real-world applications. This blog is where I’ll document everything I learn, not just for others but also for my future self.
So here we go — Day 1 and Day 2 are done, and this is what I’ve learned so far.
🧠 What is Machine Learning?
💡 The Definition
Machine Learning is a subfield of Artificial Intelligence that gives machines the ability to learn patterns from data and make decisions or predictions without being explicitly programmed.
🔍 Arthur Samuel (1959):
“Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed.”
It allows computers to:
Learn from historical data
Identify patterns or trends
Make predictions or decisions on new data
📦 Real-Life Examples
Problem | ML Solution |
Email spam detection | Classify emails as spam or not spam |
Movie recommendations | Predict what movies you'll like |
Loan approval | Predict credit worthiness |
Disease diagnosis | Predict illness based on symptoms/lab results |
Chatbots (like ChatGPT, Gemini, deepseek) | NLP-based conversational models |
🧩 Categories of Machine Learning
Machine Learning algorithms can be broadly classified into three major categories:
1. Supervised Learning
In this approach, the model is trained on labeled data, i.e., each input comes with a corresponding output.
Input: Hours studied
Output: Marks obtained
🧠 The model learns the relationship between input and output, so it can predict future outcomes for new, unseen data.
Common Types:
Regression – Predict continuous values (e.g., house price)
Classification – Predict discrete labels (e.g., spam vs not spam)
2. Unsupervised Learning
Here, the data has no labels. The model’s task is to find patterns or groupings within the data.
Examples:
Clustering (e.g., grouping customers by behavior)
Dimensionality reduction (e.g., PCA)
3. Reinforcement Learning
An agent learns by interacting with an environment, getting rewards or punishments based on its actions. This mimics how humans learn from feedback.
Examples:
Game-playing agents (e.g., AlphaGo)
Robotics
Stock trading bots
🎯 Supervised Learning in Detail
Since Supervised Learning is the most beginner-friendly and widely used in industry, that’s where I started.
The two primary tasks in supervised learning are:
Task | Description | Example |
Regression | Predict continuous values | Predicting salary based on experience |
Classification | Predict category/label | Predicting if an email is spam or not |
📈 Day 2: Deep Dive into Linear Regression
🏗️ What is Linear Regression?
Linear Regression is a supervised learning algorithm used to model the relationship between a dependent variable (target) and one or more independent variables (features) by fitting a linear equation to observed data.
🔢 Univariate Linear Regression
This is the simplest form — only one feature (independent variable).
✅ Objective:
Find a line of best fit through the data points such that the difference between predicted and actual values is minimized.
📌 Model Formula (Hypothesis)
We want to find the best values for parameters (θ₀ and θ₁) in the following equation:
y = θ₀ + θ₁ * x
Where:
x
Is the input featurey
Is the output labelθ₀
Is the intercept (bias)θ₁
Is the slope (weight)
📉 Cost Function (Mean Squared Error)
We need a way to measure how good our model is. That's where the cost function comes in.
J(θ₀, θ₁) = (1/2m) * Σ(hθ(xᵢ) - yᵢ)²
Where:
m
= number of data pointshθ(xᵢ)
= predicted outputyᵢ
= actual output
The goal is to minimize this cost.
🧮 Optimization using Gradient Descent
Gradient Descent is the algorithm used to minimize the cost function by updating θ₀ and θ₁ gradually.
Update Rules:
θ₀ := θ₀ - α (1/m) Σ(hθ(xᵢ) - yᵢ)
θ₁ := θ₁ - α (1/m) Σ(hθ(xᵢ) - yᵢ) * xᵢ
Where:
α
= learning rate (controls step size)
🧠 Note:
A too high learning rate can overshoot the minimum.
A too low rate leads to very slow convergence.
📊 Visual Intuition
Imagine standing on a 3D hill (cost surface). Gradient descent tells you which direction to step in (downhill) and how big your step should be — until you reach the bottom (minimized cost).
🔢 Multivariate Linear Regression
In real-world data, we often have multiple features.
Model Equation:
y = θ₀ + θ₁ x₁ + θ₂ x₂ + ... + θn * xn
Now:
x₁, x₂, ..., xn
Are the input featuresThe equation still defines a linear relationship, but in n-dimensional space
Gradient descent works the same, just using vectors instead of scalars.
Practical Implementation
Part 1: Load and Explore the Data
import pandas as pd
# read the dataset from cvs file
df = pd.read_csv('dataset/StudentsPerformance.csv')
# display first five rows
df.head()
# Check dataset Info
df.info()
#check for missing values in each column
df.isnull().sum()
# get Stat summary
df.describe()
Part 2: Preprocess the data
# preprocessing the data
df.head()
# converting gender, # converting gender and lunch to numeric for regression and lunch to numeric for regression
df['gender'] = df['gender'].map({'male': 0, 'female': 1})
df['lunch'] = df['lunch'].map({'standard': 1, 'free/reduced': 0})
df['test preparation course'] = df['test preparation course'].map({'completed': 1, 'none': 0})
# display updated data
df.head()
Part 3: Univariate linear regression
# import numpy and matplotlib
import numpy as np
import matplotlib.pyplot as plt
# using reading score to pridict maths score for single feature liner regression.
x = df['reading score'].values
y = df['math score'].values
# number of samples
m = len(x)
# define cost function
def compute_cost(x,y,w,b):
"""
Computes the Mean Squared Error (MSE) cost function J(w, b) for linear regression.
Args:
x (ndarray): Input features of shape (m,)
y (ndarray): True target values of shape (m,)
w (float): Weight parameter
b (float): Bias term
Returns:
cost (float): The value of the cost function
"""
m = x.shape[0]
total_cost = 0.0
for i in range(m):
f_wb = np.dot(w, x[i]) + b
total_cost += (f_wb - y[i])**2
total_cost /= (2*m)
return total_cost
# compute gradient for w, b
def compute_gradient(x,y,w,b):
"""
Computes the gradient of the cost function J(w, b) with respect to parameters w and b.
Args:
x (ndarray): Input features of shape (m,)
y (ndarray): True target values of shape (m,)
w (float): Weight parameter
b (float): Bias term
Returns:
dj_dw (float): Gradient of cost with respect to w
dj_db (float): Gradient of cost with respect to b
"""
m = x.shape[0]
dj_dw = 0
dj_db = 0
for i in range(m):
f_wb = np.dot(w, x[i]) + b
dj_dw += (f_wb - y[i])*x[i]
dj_db += (f_wb - y[i])
dj_dw /= m
dj_db /= m
return dj_dw,dj_db
def gradient_decent(x,y, w_in, b_in, alpha, num_iterations):
"""
Performs batch gradient descent to learn the optimal parameters w and b.
Args:
x (ndarray): Input feature values of shape (m,)
y (ndarray): Target values of shape (m,)
w_in (float): Initial weight
b_in (float): Initial bias
alpha (float): Learning rate (step size)
num_iters (int): Number of iterations to run gradient descent
Returns:
w (float): Learned weight after gradient descent
b (float): Learned bias after gradient descent
Process:
- For each iteration:
1. Compute the gradient (partial derivatives) of the cost function J with respect to w and b
2. Update w and b using the gradients scaled by the learning rate
3. Every 100 iterations, compute and print the current cost and parameters for tracking
"""
w = w_in
b = b_in
J_history = []
for i in range(num_iterations):
dj_dw, dj_db = compute_gradient(x,y, w, b)
# update params
w = w - alpha * dj_dw
b = b - alpha * dj_db
# save cost J at every 100 Iterations
if i % 100 == 0:
cost = compute_cost(x,y,w,b)
J_history.append(cost)
print(f"Iteration {i:4}: Cost {cost:.2f}, w = {w:.2f}, b = {b:.2f}")
return w, b
# initial values of w,b, alpha(learning rate), intrations
w_init, b_init, alpha, iterations = 0, 0, 0.0001, 2000
# train the model
w_final, b_final = gradient_decent(x,y,w_init, b_init, alpha, iterations)
print(f"\nFinal parameters: w = {w_final:.2f}, b = {b_final:.2f}")
Part 4: Plot Prediction
# plotting
plt.scatter(x,y, label= 'Actual')
# recalculation of y^ based on
f_wb = np.dot(w_final,x) + b_final
plt.plot(x,f_wb, color='red', label='Prediction')
plt.xlabel('Reading Score')
plt.ylabel('Math Score')
plt.title('Linear Regression Fit using Gradient Descent')
plt.legend()
plt.grid(True)
plt.show()
Multiple linear regression
# getting the multiple features
x_features = ['reading score', 'writing score']
x_train = df.loc[:,x_features].values
y_train = df['math score'].values
# print(x_train)
# print(y_train)
# Given the values of training set feature scaling is needed
# if we do not normalise our features on with higher range 0-100 will dominate and slow down the break convergence.
mean = np.mean(x_train, axis=0)
sigma = np.std(x_train,axis=0)
x_train_norm = (x_train - mean)/ sigma
m = x_train_norm.shape[0]
X_b = np.hstack([np.ones((m, 1)), x_train_norm])
# Cost function
def compute_cost(X, y, w):
"""
Compute the cost J(w) for linear regression.
Parameters:
X : (m, n+1) numpy array of input features
y : (m,) numpy array of target values
w : (n+1,) numpy array of weights
Returns:
J : float, the cost
"""
m = X.shape[0]
predictions = np.dot(X, w)
errors = predictions - y
cost = (1 / (2 * m)) * np.dot(errors, errors)
return cost
# Gradient function
def compute_gradient(X, y, w):
"""
Compute gradient for linear regression cost function.
Parameters:
X : (m, n+1) input features
y : (m,) target values
w : (n+1,) weight vector
Returns:
grad : (n+1,) numpy array of gradients
"""
m = X.shape[0]
predictions = np.dot(X, w)
errors = predictions - y
gradient = (1 / m) * np.dot(X.T, errors)
return gradient
# Gradient Descent
def gradient_descent(X, y, w, alpha, num_iters):
"""
Performs gradient descent to learn w.
Parameters:
X : (m, n+1) feature matrix
y : (m,) target values
w : (n+1,) initial weights
alpha : float, learning rate
num_iters : int, number of iterations
Returns:
w : learned weights
J_history : list of cost values
"""
J_history = []
for i in range(num_iters):
grad = compute_gradient(X, y, w)
w = w - alpha * grad
J_history.append(compute_cost(X, y, w))
return w, J_history
w_init = np.zeros(X_b.shape[1])
alpha = 0.01
num_iters = 1000
w_final, J_history = gradient_descent(X_b, y_train, w_init, alpha, num_iters)
# Final predictions and cost for Multivariate Linear Regression
# Predict using the learned weights
y_pred = np.dot(X_b, w_final)
# Calculate final cost
final_cost = (1 / (2 * m)) * np.sum((y_pred - y_train) ** 2)
# Output
print("Final learned weights (θ):", w_final)
print("Final cost (J):", final_cost)
# Compare first 5 actual vs predicted values
for i in range(5):
print(f"Actual: {y_train[i]:.2f}, Predicted: {y_pred[i]:.2f}")
import matplotlib.pyplot as plt
plt.figure(figsize=(10,6))
plt.scatter(range(len(y_train)), y_train, label='Actual', alpha=0.7)
plt.scatter(range(len(y_pred)), y_pred, label='Predicted', alpha=0.7)
plt.title("Actual vs Predicted Math Scores (Multivariate LR)")
plt.xlabel("Data Point Index")
plt.ylabel("Math Score")
plt.legend()
plt.grid(True)
plt.show()
🔚 Final Thoughts
Completing Day 1 and Day 2 of my Machine Learning journey has been truly energizing.
I now have a solid grip on:
What Machine Learning is
The categories of ML and why Supervised Learning is a logical starting point
The full math and intuition behind Linear Regression — from hypothesis to cost function and gradient descent
The difference between univariate and multivariate models
Implementing both approaches from scratch in Python
Writing this blog helped me reinforce concepts that once felt fuzzy. More importantly, it gave me clarity — and a place to return when revision time comes.
🔭 What's Next?
In the next post, I’ll cover:
Ridge and Lasso Regression: What happens when Linear Regression overfits?
Understanding regularization: Bias vs Variance, L1 vs L2
Complete Python implementation from scratch and with
sklearn
🙌 Final Note
If you're also on a similar journey — starting or revising ML fundamentals — I hope this post helped you even a little bit.
Feel free to follow, share feedback, or just say hi.
Thanks for reading. Let’s keep learning!
– Ashish Sharma
Subscribe to my newsletter
Read articles from Ashish Sharma directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
