Why I Started Machine Learning – A Deep Dive into Day 1 & 2 Learnings

Ashish SharmaAshish Sharma
10 min read

👋 Introduction

Hello there!

I'm Ashish Sharma, a software engineer who’s worked across tech stacks — from backend APIs to frontend UI and DevOps pipelines. But there’s one area I’ve long been curious about, one that powers everything from ChatGPT to fraud detection systems - Machine Learning (ML).

Recently, I decided to stop postponing and formally start my ML journey. I aim to deeply understand concepts, mathematical foundations, and real-world applications. This blog is where I’ll document everything I learn, not just for others but also for my future self.

So here we go — Day 1 and Day 2 are done, and this is what I’ve learned so far.


🧠 What is Machine Learning?

💡 The Definition

Machine Learning is a subfield of Artificial Intelligence that gives machines the ability to learn patterns from data and make decisions or predictions without being explicitly programmed.

🔍 Arthur Samuel (1959):
“Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed.”

It allows computers to:

  • Learn from historical data

  • Identify patterns or trends

  • Make predictions or decisions on new data


📦 Real-Life Examples

ProblemML Solution
Email spam detectionClassify emails as spam or not spam
Movie recommendationsPredict what movies you'll like
Loan approvalPredict credit worthiness
Disease diagnosisPredict illness based on symptoms/lab results
Chatbots (like ChatGPT, Gemini, deepseek)NLP-based conversational models

🧩 Categories of Machine Learning

Machine Learning algorithms can be broadly classified into three major categories:

1. Supervised Learning

In this approach, the model is trained on labeled data, i.e., each input comes with a corresponding output.

  • Input: Hours studied

  • Output: Marks obtained

🧠 The model learns the relationship between input and output, so it can predict future outcomes for new, unseen data.

Common Types:

  • Regression – Predict continuous values (e.g., house price)

  • Classification – Predict discrete labels (e.g., spam vs not spam)


2. Unsupervised Learning

Here, the data has no labels. The model’s task is to find patterns or groupings within the data.

Examples:

  • Clustering (e.g., grouping customers by behavior)

  • Dimensionality reduction (e.g., PCA)


3. Reinforcement Learning

An agent learns by interacting with an environment, getting rewards or punishments based on its actions. This mimics how humans learn from feedback.

Examples:

  • Game-playing agents (e.g., AlphaGo)

  • Robotics

  • Stock trading bots


🎯 Supervised Learning in Detail

Since Supervised Learning is the most beginner-friendly and widely used in industry, that’s where I started.

The two primary tasks in supervised learning are:

TaskDescriptionExample
RegressionPredict continuous valuesPredicting salary based on experience
ClassificationPredict category/labelPredicting if an email is spam or not

📈 Day 2: Deep Dive into Linear Regression

🏗️ What is Linear Regression?

Linear Regression is a supervised learning algorithm used to model the relationship between a dependent variable (target) and one or more independent variables (features) by fitting a linear equation to observed data.


🔢 Univariate Linear Regression

This is the simplest form — only one feature (independent variable).

✅ Objective:

Find a line of best fit through the data points such that the difference between predicted and actual values is minimized.


📌 Model Formula (Hypothesis)

We want to find the best values for parameters (θ₀ and θ₁) in the following equation:

y = θ₀ + θ₁ * x

Where:

  • x Is the input feature

  • y Is the output label

  • θ₀ Is the intercept (bias)

  • θ₁ Is the slope (weight)


📉 Cost Function (Mean Squared Error)

We need a way to measure how good our model is. That's where the cost function comes in.

J(θ₀, θ₁) = (1/2m) * Σ(hθ(xᵢ) - yᵢ)²

Where:

  • m = number of data points

  • hθ(xᵢ) = predicted output

  • yᵢ = actual output

The goal is to minimize this cost.


🧮 Optimization using Gradient Descent

Gradient Descent is the algorithm used to minimize the cost function by updating θ₀ and θ₁ gradually.

Update Rules:

θ₀ := θ₀ - α (1/m) Σ(hθ(xᵢ) - yᵢ)
θ₁ := θ₁ - α (1/m) Σ(hθ(xᵢ) - yᵢ) * xᵢ

Where:

  • α = learning rate (controls step size)

🧠 Note:
A too high learning rate can overshoot the minimum.
A too low rate leads to very slow convergence.


📊 Visual Intuition

Imagine standing on a 3D hill (cost surface). Gradient descent tells you which direction to step in (downhill) and how big your step should be — until you reach the bottom (minimized cost).


🔢 Multivariate Linear Regression

In real-world data, we often have multiple features.

Model Equation:

y = θ₀ + θ₁ x₁ + θ₂ x₂ + ... + θn * xn

Now:

  • x₁, x₂, ..., xn Are the input features

  • The equation still defines a linear relationship, but in n-dimensional space

Gradient descent works the same, just using vectors instead of scalars.


Practical Implementation

Part 1: Load and Explore the Data

import pandas as pd

# read the dataset from cvs file
df = pd.read_csv('dataset/StudentsPerformance.csv')

# display first five rows
df.head()

# Check dataset Info
df.info()

#check for missing values in each column
df.isnull().sum()

# get Stat summary 
df.describe()

Part 2: Preprocess the data

# preprocessing the data
df.head()

# converting gender, # converting gender and lunch to numeric for regression and lunch to numeric for regression
df['gender'] = df['gender'].map({'male': 0, 'female': 1})
df['lunch'] = df['lunch'].map({'standard': 1, 'free/reduced': 0})
df['test preparation course'] = df['test preparation course'].map({'completed': 1, 'none': 0})

# display updated data
df.head()

Part 3: Univariate linear regression

# import numpy and matplotlib
import numpy as np
import matplotlib.pyplot as plt

# using reading score to pridict maths score for single feature liner regression.
x = df['reading score'].values
y = df['math score'].values

#  number of samples
m = len(x)
# define cost function
def compute_cost(x,y,w,b):
    """
    Computes the Mean Squared Error (MSE) cost function J(w, b) for linear regression.

    Args:
        x (ndarray): Input features of shape (m,)
        y (ndarray): True target values of shape (m,)
        w (float): Weight parameter
        b (float): Bias term

    Returns:
        cost (float): The value of the cost function
    """
    m = x.shape[0]
    total_cost = 0.0
    for i in range(m):
        f_wb = np.dot(w, x[i]) + b
        total_cost += (f_wb - y[i])**2
    total_cost /= (2*m)
    return total_cost


# compute gradient for w, b
def compute_gradient(x,y,w,b):
    """
    Computes the gradient of the cost function J(w, b) with respect to parameters w and b.

    Args:
        x (ndarray): Input features of shape (m,)
        y (ndarray): True target values of shape (m,)
        w (float): Weight parameter
        b (float): Bias term

    Returns:
        dj_dw (float): Gradient of cost with respect to w
        dj_db (float): Gradient of cost with respect to b
    """
    m = x.shape[0]
    dj_dw = 0
    dj_db = 0
    for i in range(m):
        f_wb = np.dot(w, x[i]) + b
        dj_dw += (f_wb - y[i])*x[i]
        dj_db += (f_wb - y[i])

    dj_dw /= m
    dj_db /= m
    return dj_dw,dj_db

def gradient_decent(x,y, w_in, b_in, alpha, num_iterations):
    """
    Performs batch gradient descent to learn the optimal parameters w and b.

    Args:
        x (ndarray): Input feature values of shape (m,)
        y (ndarray): Target values of shape (m,)
        w_in (float): Initial weight
        b_in (float): Initial bias
        alpha (float): Learning rate (step size)
        num_iters (int): Number of iterations to run gradient descent

    Returns:
        w (float): Learned weight after gradient descent
        b (float): Learned bias after gradient descent

    Process:
        - For each iteration:
            1. Compute the gradient (partial derivatives) of the cost function J with respect to w and b
            2. Update w and b using the gradients scaled by the learning rate
            3. Every 100 iterations, compute and print the current cost and parameters for tracking
    """
    w = w_in
    b = b_in
    J_history = []
    for i in range(num_iterations):
        dj_dw, dj_db = compute_gradient(x,y, w, b)

        # update params
        w = w - alpha * dj_dw
        b = b - alpha * dj_db

        # save cost J at every 100 Iterations
        if i % 100 == 0:
            cost = compute_cost(x,y,w,b)
            J_history.append(cost)
            print(f"Iteration {i:4}: Cost {cost:.2f}, w = {w:.2f}, b = {b:.2f}")
    return w, b
# initial values of w,b, alpha(learning rate), intrations
w_init, b_init, alpha, iterations = 0, 0, 0.0001, 2000

# train the model 
w_final, b_final = gradient_decent(x,y,w_init, b_init, alpha, iterations)

print(f"\nFinal parameters: w = {w_final:.2f}, b = {b_final:.2f}")

Part 4: Plot Prediction

# plotting
plt.scatter(x,y, label= 'Actual')

# recalculation of y^ based on 
f_wb = np.dot(w_final,x) + b_final

plt.plot(x,f_wb, color='red', label='Prediction')
plt.xlabel('Reading Score')
plt.ylabel('Math Score')
plt.title('Linear Regression Fit using Gradient Descent')
plt.legend()
plt.grid(True)
plt.show()

Multiple linear regression

# getting the multiple features
x_features = ['reading score', 'writing score']
x_train = df.loc[:,x_features].values
y_train = df['math score'].values
# print(x_train)
# print(y_train)

# Given the values of training set feature scaling is needed 
# if we do not normalise our features on with higher range 0-100 will dominate and slow down the break convergence.
mean = np.mean(x_train, axis=0)
sigma = np.std(x_train,axis=0)
x_train_norm = (x_train - mean)/ sigma

m = x_train_norm.shape[0]
X_b = np.hstack([np.ones((m, 1)), x_train_norm])
# Cost function
def compute_cost(X, y, w):
    """
    Compute the cost J(w) for linear regression.

    Parameters:
    X : (m, n+1) numpy array of input features
    y : (m,) numpy array of target values
    w : (n+1,) numpy array of weights

    Returns:
    J : float, the cost
    """
    m = X.shape[0]
    predictions = np.dot(X, w)
    errors = predictions - y
    cost = (1 / (2 * m)) * np.dot(errors, errors)
    return cost

# Gradient function
def compute_gradient(X, y, w):
    """
    Compute gradient for linear regression cost function.

    Parameters:
    X : (m, n+1) input features
    y : (m,) target values
    w : (n+1,) weight vector

    Returns:
    grad : (n+1,) numpy array of gradients
    """
    m = X.shape[0]
    predictions = np.dot(X, w)
    errors = predictions - y
    gradient = (1 / m) * np.dot(X.T, errors)
    return gradient

# Gradient Descent
def gradient_descent(X, y, w, alpha, num_iters):
    """
    Performs gradient descent to learn w.

    Parameters:
    X : (m, n+1) feature matrix
    y : (m,) target values
    w : (n+1,) initial weights
    alpha : float, learning rate
    num_iters : int, number of iterations

    Returns:
    w : learned weights
    J_history : list of cost values
    """
    J_history = []
    for i in range(num_iters):
        grad = compute_gradient(X, y, w)
        w = w - alpha * grad
        J_history.append(compute_cost(X, y, w))
    return w, J_history
w_init = np.zeros(X_b.shape[1])
alpha = 0.01
num_iters = 1000

w_final, J_history = gradient_descent(X_b, y_train, w_init, alpha, num_iters)

# Final predictions and cost for Multivariate Linear Regression

# Predict using the learned weights
y_pred = np.dot(X_b, w_final)

# Calculate final cost
final_cost = (1 / (2 * m)) * np.sum((y_pred - y_train) ** 2)

# Output
print("Final learned weights (θ):", w_final)
print("Final cost (J):", final_cost)

# Compare first 5 actual vs predicted values
for i in range(5):
    print(f"Actual: {y_train[i]:.2f}, Predicted: {y_pred[i]:.2f}")
import matplotlib.pyplot as plt

plt.figure(figsize=(10,6))
plt.scatter(range(len(y_train)), y_train, label='Actual', alpha=0.7)
plt.scatter(range(len(y_pred)), y_pred, label='Predicted', alpha=0.7)
plt.title("Actual vs Predicted Math Scores (Multivariate LR)")
plt.xlabel("Data Point Index")
plt.ylabel("Math Score")
plt.legend()
plt.grid(True)
plt.show()


🔚 Final Thoughts

Completing Day 1 and Day 2 of my Machine Learning journey has been truly energizing.
I now have a solid grip on:

  • What Machine Learning is

  • The categories of ML and why Supervised Learning is a logical starting point

  • The full math and intuition behind Linear Regression — from hypothesis to cost function and gradient descent

  • The difference between univariate and multivariate models

  • Implementing both approaches from scratch in Python

Writing this blog helped me reinforce concepts that once felt fuzzy. More importantly, it gave me clarity — and a place to return when revision time comes.


🔭 What's Next?

In the next post, I’ll cover:

  • Ridge and Lasso Regression: What happens when Linear Regression overfits?

  • Understanding regularization: Bias vs Variance, L1 vs L2

  • Complete Python implementation from scratch and with sklearn


🙌 Final Note

If you're also on a similar journey — starting or revising ML fundamentals — I hope this post helped you even a little bit.
Feel free to follow, share feedback, or just say hi.

Thanks for reading. Let’s keep learning!

– Ashish Sharma

0
Subscribe to my newsletter

Read articles from Ashish Sharma directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Ashish Sharma
Ashish Sharma