🔐 Classification Made Simple: A Beginner’s Guide to Logistic Regression in Python


“If linear regression predicts numbers, logistic regression predicts decisions.”
— Tilak Savani
🧠 Introduction
Logistic Regression is one of the most fundamental techniques in machine learning for classification problems. While linear regression predicts continuous values, logistic regression helps predict categories — like yes/no, pass/fail, or spam/not spam.
If you're stepping into classification, this is a great place to begin.
❓ What is Logistic Regression?
Logistic Regression is a supervised learning algorithm used to classify data into binary classes (e.g., 0 or 1).
Instead of fitting a straight line like linear regression, it fits an S-shaped curve — called the sigmoid — that maps predicted values between 0 and 1.
⚙️ How It Works
1. Sigmoid Function
The logistic regression model uses the sigmoid function:
σ(z) = 1 / (1 + e^(-z))
Where z = mx + c
(similar to linear regression)
If σ(z) > 0.5, we predict class 1
If σ(z) < 0.5, we predict class 0
2. Decision Boundary
The decision boundary is the threshold (usually 0.5) that splits the two classes. You can adjust this threshold to make the model more or less strict depending on your use case.
📦 Use Case: Will a Student Pass?
Let’s predict whether a student will pass an exam based on how many hours they studied.
Label:
1 = Pass
0 = Fail
Example dataset:
Hours Studied | Passed |
1.0 | 0 |
2.0 | 0 |
3.0 | 0 |
4.0 | 1 |
5.0 | 1 |
6.0 | 1 |
🧪 Code Implementation in Python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
# Training Data
X = np.array([[1.0], [2.0], [3.0], [4.0], [5.0], [6.0]])
y = np.array([0, 0, 0, 1, 1, 1])
# Train the model
model = LogisticRegression()
model.fit(X, y)
# Prediction
X_test = np.linspace(0, 7, 100).reshape(-1, 1)
y_prob = model.predict_proba(X_test)[:, 1]
y_pred = model.predict(X_test)
# Plotting
plt.plot(X_test, y_prob, label="Probability of Passing")
plt.axhline(0.5, color='red', linestyle='--', label='Decision Boundary (0.5)')
plt.scatter(X, y, c=y, cmap='bwr', label='Training Data')
plt.xlabel("Hours Studied")
plt.ylabel("Probability / Outcome")
plt.title("Logistic Regression: Pass Prediction")
plt.legend()
plt.show()
# Predict for a new value
new_hours = np.array([[3.5]])
prediction = model.predict(new_hours)[0]
print(f"Prediction for 3.5 hours studied: {'Pass' if prediction == 1 else 'Fail'}")
📊 Output & Interpretation
Prediction for 3.5 hours studied: Pass
As the number of hours increases, so does the probability of passing. At 3.5 hours, the model predicts the student is likely to pass — just over the 0.5 threshold.
🌐 Real-World Applications
Domain | Use Case |
Healthcare | Disease diagnosis (sick or healthy) |
Finance | Loan approval (yes or no) |
Marketing | Email open prediction (open or not) |
HR | Employee retention (stay or leave) |
🧩 Final Thoughts
Logistic Regression is a powerful yet simple classification algorithm that is perfect for beginners. It’s interpretable, fast, and great for binary outcomes. Once you master it, you’ll be ready to explore more advanced classifiers like Decision Trees, Random Forests, and Neural Networks.
✉️ Subscribe to my blog!
Stay tuned for more hands-on ML tutorials and projects. 🚀 Follow me on Hash node and let's grow together in the world of Artificial Intelligence.
Subscribe to my newsletter
Read articles from Tilak Savani directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
