How I Built Logistic Regression from Scratch


Intro
As a data science student my quote is "start hard so things looks easy" , when i first began learning machine learning like most beginners i reached for scikit-learn library that i personally call it easy lazy library , it is good and easy but I'm a kind of person who don't like easy things and look for problems 🤣
That's what i did, I chose a cleaned dataset because my goal is not EDA ,and i code two versions of logistic Regression the cool one with scikit-learn and the second is mathematical implementation I used the Sonar Dataset with 60 features, where the goal is to classify rocks ('R') vs. mines ('M')
And this is what i learned
What is Logistic Regression?
Logistic regression helps us answer yes/no questions like:
Will this email be spam? (yes/no)
Will the customer buy this product? (yes/no)
Is this tumor cancerous? (yes/no)
scikit-learn Approach
Here's a basic code
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
This black-box approach worked wonderfully, but I soon realized I didn't truly understand what was happening under the hood because :
I couldn't explain exactly how predictions were being made
I struggled to customize the algorithm for specific needs
I didn't fully grasp the impact of hyperparameters
Debugging model issues was difficult without fundamental knowledge
This realization prompted me to dive into the mathematical foundations of logistic regression.
Math Approach
At first , the most basic thing to do is prepare the data! i mean by that separete the features from the target
# X = All the clues (columns 0-59)
X = df.drop(columns=60, axis=1)
# Y = The answer (column 60: 'R' or 'M')
Y = df[60]
Y = (Y == 'R').astype(int) # Turn 'R'→1, 'M'→0
then we can start our heavy mathematical implementation
Adding the Bias Term
def add_bias(X):
# Add column of 1's at position 0
return np.c_[np.ones(X.shape[0]), X]
X_with_bias = add_bias(X)
In logistic regression, our hypothesis function is:
$$h_\theta(x) = g(\theta^T x) = g(\theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_3)$$
Where:
g(z)
is the sigmoid functionθ₀
is our bias term (also called intercept)θ₁
toθₙ
are weights for each feature
We initialize all parameters (including θ₀) to zero:
n_features = X.shape[1] # Number of original features
theta = np.zeros(n_features + 1) # +1 for the bias term
This gives us:
θ[0] = θ₀ (bias)
θ[1] = θ₁ (first feature weight)
…
θ[n] = θₙ (last feature weight)
1. The Sigmoid Function(Probability Prediction)
The first revelation was understanding how logistic regression transforms linear outputs into probabilities using the sigmoid function:
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def proba_predict(X, theta):
z = np.dot(X, theta)
return sigmoid(z) # Probability (0 to 1)
This S-shaped curve maps any real-valued number to a value between 0 and 1, perfect for probability estimation. Example:
If
z = 0
→ Sigmoid says 0.5 ("Maybe!")If
z = 5
→ Sigmoid says ~0.99 ("Probably YES!")If
z = -5
→ Sigmoid says ~0.01 ("Probably NO!")
This helps us turn numbers into probabilities!
2. The Cost Function Challenge
The goal of the cost function is to Measures how wrong the predictions are (penalizes bad predictions).
$$J(\theta) = -\frac{1}{n} \sum_{i=1}^n \left[ y^{(i)} \log(h_\theta(x^{(i)})) + (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)})) \right]$$
def cost_function(X, Y, theta):
n = len(Y)
h = proba_predict(X, theta) # Probabilities with sigmoid
#or like this
# h = sigmoid(X.dot(theta))
cost = -(1/n) * np.sum(Y * np.log(h) + (1-Y) * np.log(1-h))
return cost
If
Y=1
andh≈0
(wrong prediction),log(h)
→-∞
(high cost).If
Y=1
andh≈1
(correct prediction),log(h)
→0
(low cost).
3. Gradient Descent Implementation(Optimizing θ)
The goal here is to adjusts θ to reduce cost.
$$\frac{\partial J(\theta)}{\partial \theta_j} = \frac{1}{n} \sum_{i=1}^n (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)}$$
the update rule :
$$\theta_j := \theta_j - \alpha \cdot \frac{\partial J(\theta)}{\partial \theta_j}$$
Understanding how the algorithm actually learns through gradient descent was transformative:
def gradient_descent(X, Y, theta, alpha, num_iterations):
n = len(Y)
cost_path = [] # cost over iterations
for _ in range(num_iterations):
h = proba_predict(X, theta)
gradient = np.dot(X.T, (h - Y)) / n # Derivative of cost
theta -= alpha * gradient # Update θ
cost = cost_function(X, Y, theta)
cost_path.append(cost)
return theta, cost_path
This implementation forms the backbone of logistic regression in machine learning. By tweaking alpha
and num_iterations
, you can optimize performance further.
Making Predictions
def predict(new_X, theta):
# Add bias term to new data
new_X_with_bias = np.c_[np.ones(new_X.shape[0]), new_X]
# Get probabilities
probabilities = proba_predict(new_X_with_bias, theta)
# Convert to binary predictions (threshold at 0.5)
class_predictions = (probabilities >= 0.5).astype(int)
return probabilities, class_predictions
Comparing Both Implementations
In my view as a student it is most important to start mathematically so u understand the model behind the scene and even it give the ability to analyze and think more clearly better that a person who just know how to implement a model and or logistic regression when there is a classification problem , the deep dive into the model gives you the ability to control the implementation and of course when i will be a data scientist i will not use mathematical approach just when I'm going to need it .
Practical Implications
After completing this exercise:
I could better explain logistic regression to me and answer effectively in class
I gained confidence to implement custom variations of algorithms
My ability to debug model issues improved significantly
I developed a framework for understanding other ML algorithms
Conclusion: Why Every Data Scientist Should Do This
Implementing algorithms from scratch provides invaluable learning. My journey from:
from sklearn.linear_model import LogisticRegression
to understanding and implementing the complete mathematical foundation transformed me from someone who could apply machine learning to someone who truly understands it.
For anyone serious about machine learning, I highly recommend taking this journey with at least one algorithm. The insights you gain will elevate your understanding of all subsequent models you encounter.
The learning never stops!👋
I've open-sourced all my code in this GitHub repository to help you dive deeper:
⛏️ Mine vs. Rock: The Logistic Regression
Join the Learning Journey 🔹 Star the repo if you find it helpful
🔹 Open an Issue if you spot ways to improve
🔹 Fork it to add your own enhancements (maybe regularization or momentum?)
"The best way to learn is to collaborate – I'll be thrilled if you contribute!"
Subscribe to my newsletter
Read articles from ASSIA EL BOUSSANNI directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
ASSIA EL BOUSSANNI
ASSIA EL BOUSSANNI
🎓 Master's student in Big Data & Data Science | 🚀 Focused on data science, big data, machine learning, and development. Passionate about designing scalable systems and solving real-world problems with tech innovation. 🌟 On my blog, I break down complex concepts in system design and data science to help others grow. Let’s learn and build together! 💡