How I Built Logistic Regression from Scratch

Intro

As a data science student my quote is "start hard so things looks easy" , when i first began learning machine learning like most beginners i reached for scikit-learn library that i personally call it easy lazy library , it is good and easy but I'm a kind of person who don't like easy things and look for problems 🤣

That's what i did, I chose a cleaned dataset because my goal is not EDA ,and i code two versions of logistic Regression the cool one with scikit-learn and the second is mathematical implementation I used the Sonar Dataset with 60 features, where the goal is to classify rocks ('R') vs. mines ('M')

And this is what i learned

What is Logistic Regression?

Logistic regression helps us answer yes/no questions like:

  • Will this email be spam? (yes/no)

  • Will the customer buy this product? (yes/no)

  • Is this tumor cancerous? (yes/no)

scikit-learn Approach

Scikit-learn Guide

Here's a basic code

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

This black-box approach worked wonderfully, but I soon realized I didn't truly understand what was happening under the hood because :

  1. I couldn't explain exactly how predictions were being made

  2. I struggled to customize the algorithm for specific needs

  3. I didn't fully grasp the impact of hyperparameters

  4. Debugging model issues was difficult without fundamental knowledge

This realization prompted me to dive into the mathematical foundations of logistic regression.

Math Approach

At first , the most basic thing to do is prepare the data! i mean by that separete the features from the target

# X = All the clues (columns 0-59)  
X = df.drop(columns=60, axis=1)  

# Y = The answer (column 60: 'R' or 'M')  
Y = df[60]  
Y = (Y == 'R').astype(int)  # Turn 'R'→1, 'M'→0

then we can start our heavy mathematical implementation

Adding the Bias Term

def add_bias(X):
    # Add column of 1's at position 0
    return np.c_[np.ones(X.shape[0]), X]

X_with_bias = add_bias(X)

In logistic regression, our hypothesis function is:

$$h_\theta(x) = g(\theta^T x) = g(\theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_3)$$

Where:

  • g(z) is the sigmoid function

  • θ₀ is our bias term (also called intercept)

  • θ₁ to θₙ are weights for each feature

We initialize all parameters (including θ₀) to zero:

n_features = X.shape[1]  # Number of original features
theta = np.zeros(n_features + 1)  # +1 for the bias term

This gives us:

  • θ[0] = θ₀ (bias)

  • θ[1] = θ₁ (first feature weight)

  • θ[n] = θₙ (last feature weight)

1. The Sigmoid Function(Probability Prediction)

The first revelation was understanding how logistic regression transforms linear outputs into probabilities using the sigmoid function:

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def proba_predict(X, theta):
    z = np.dot(X, theta) 
    return sigmoid(z)     # Probability (0 to 1)

This S-shaped curve maps any real-valued number to a value between 0 and 1, perfect for probability estimation. Example:

  • If z = 0 → Sigmoid says 0.5 ("Maybe!")

  • If z = 5 → Sigmoid says ~0.99 ("Probably YES!")

  • If z = -5 → Sigmoid says ~0.01 ("Probably NO!")

This helps us turn numbers into probabilities!

2. The Cost Function Challenge

The goal of the cost function is to Measures how wrong the predictions are (penalizes bad predictions).

$$J(\theta) = -\frac{1}{n} \sum_{i=1}^n \left[ y^{(i)} \log(h_\theta(x^{(i)})) + (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)})) \right]$$

def cost_function(X, Y, theta):
    n = len(Y)
    h = proba_predict(X, theta)  # Probabilities with sigmoid 
    #or like this 
    # h = sigmoid(X.dot(theta))
    cost = -(1/n) * np.sum(Y * np.log(h) + (1-Y) * np.log(1-h))
    return cost
  • If Y=1 and h≈0 (wrong prediction), log(h)-∞ (high cost).

  • If Y=1 and h≈1 (correct prediction), log(h)0 (low cost).

3. Gradient Descent Implementation(Optimizing θ)

The goal here is to adjusts θ to reduce cost.

$$\frac{\partial J(\theta)}{\partial \theta_j} = \frac{1}{n} \sum_{i=1}^n (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)}$$

the update rule :

$$\theta_j := \theta_j - \alpha \cdot \frac{\partial J(\theta)}{\partial \theta_j}$$

Understanding how the algorithm actually learns through gradient descent was transformative:

def gradient_descent(X, Y, theta, alpha, num_iterations):
    n = len(Y)
    cost_path = []  #  cost over iterations

    for _ in range(num_iterations):
        h = proba_predict(X, theta)
        gradient = np.dot(X.T, (h - Y)) / n  # Derivative of cost
        theta -= alpha * gradient            # Update θ 
        cost = cost_function(X, Y, theta)
        cost_path.append(cost)

    return theta, cost_path

This implementation forms the backbone of logistic regression in machine learning. By tweaking alpha and num_iterations, you can optimize performance further.

Making Predictions

def predict(new_X, theta):
    # Add bias term to new data
    new_X_with_bias = np.c_[np.ones(new_X.shape[0]), new_X]

    # Get probabilities
    probabilities = proba_predict(new_X_with_bias, theta)

    # Convert to binary predictions (threshold at 0.5)
    class_predictions = (probabilities >= 0.5).astype(int)

    return probabilities, class_predictions

Comparing Both Implementations

In my view as a student it is most important to start mathematically so u understand the model behind the scene and even it give the ability to analyze and think more clearly better that a person who just know how to implement a model and or logistic regression when there is a classification problem , the deep dive into the model gives you the ability to control the implementation and of course when i will be a data scientist i will not use mathematical approach just when I'm going to need it .

Practical Implications

After completing this exercise:

  1. I could better explain logistic regression to me and answer effectively in class

  2. I gained confidence to implement custom variations of algorithms

  3. My ability to debug model issues improved significantly

  4. I developed a framework for understanding other ML algorithms

Conclusion: Why Every Data Scientist Should Do This

Implementing algorithms from scratch provides invaluable learning. My journey from:

from sklearn.linear_model import LogisticRegression

to understanding and implementing the complete mathematical foundation transformed me from someone who could apply machine learning to someone who truly understands it.

For anyone serious about machine learning, I highly recommend taking this journey with at least one algorithm. The insights you gain will elevate your understanding of all subsequent models you encounter.

The learning never stops!👋

I've open-sourced all my code in this GitHub repository to help you dive deeper:
⛏️ Mine vs. Rock: The Logistic Regression

Join the Learning Journey 🔹 Star the repo if you find it helpful
🔹 Open an Issue if you spot ways to improve
🔹 Fork it to add your own enhancements (maybe regularization or momentum?)

"The best way to learn is to collaborate – I'll be thrilled if you contribute!"

0
Subscribe to my newsletter

Read articles from ASSIA EL BOUSSANNI directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

ASSIA EL BOUSSANNI
ASSIA EL BOUSSANNI

🎓 Master's student in Big Data & Data Science | 🚀 Focused on data science, big data, machine learning, and development. Passionate about designing scalable systems and solving real-world problems with tech innovation. 🌟 On my blog, I break down complex concepts in system design and data science to help others grow. Let’s learn and build together! 💡