🔐 Classification Made Simple: A Beginner’s Guide to Logistic Regression in Python

Tilak SavaniTilak Savani
3 min read

“If linear regression predicts numbers, logistic regression predicts decisions.”

— Tilak Savani



🧠 Introduction

Logistic Regression is one of the most fundamental techniques in machine learning for classification problems. While linear regression predicts continuous values, logistic regression helps predict categories — like yes/no, pass/fail, or spam/not spam.

If you're stepping into classification, this is a great place to begin.


❓ What is Logistic Regression?

Logistic Regression is a supervised learning algorithm used to classify data into binary classes (e.g., 0 or 1).

Instead of fitting a straight line like linear regression, it fits an S-shaped curve — called the sigmoid — that maps predicted values between 0 and 1.


⚙️ How It Works

1. Sigmoid Function

The logistic regression model uses the sigmoid function:

    σ(z) = 1 / (1 + e^(-z))

Where z = mx + c (similar to linear regression)

  • If σ(z) > 0.5, we predict class 1

  • If σ(z) < 0.5, we predict class 0

2. Decision Boundary

The decision boundary is the threshold (usually 0.5) that splits the two classes. You can adjust this threshold to make the model more or less strict depending on your use case.


📦 Use Case: Will a Student Pass?

Let’s predict whether a student will pass an exam based on how many hours they studied.

Label:

  • 1 = Pass

  • 0 = Fail

Example dataset:

Hours StudiedPassed
1.00
2.00
3.00
4.01
5.01
6.01

🧪 Code Implementation in Python

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression

# Training Data
X = np.array([[1.0], [2.0], [3.0], [4.0], [5.0], [6.0]])
y = np.array([0, 0, 0, 1, 1, 1])

# Train the model
model = LogisticRegression()
model.fit(X, y)

# Prediction
X_test = np.linspace(0, 7, 100).reshape(-1, 1)
y_prob = model.predict_proba(X_test)[:, 1]
y_pred = model.predict(X_test)

# Plotting
plt.plot(X_test, y_prob, label="Probability of Passing")
plt.axhline(0.5, color='red', linestyle='--', label='Decision Boundary (0.5)')
plt.scatter(X, y, c=y, cmap='bwr', label='Training Data')
plt.xlabel("Hours Studied")
plt.ylabel("Probability / Outcome")
plt.title("Logistic Regression: Pass Prediction")
plt.legend()
plt.show()

# Predict for a new value
new_hours = np.array([[3.5]])
prediction = model.predict(new_hours)[0]
print(f"Prediction for 3.5 hours studied: {'Pass' if prediction == 1 else 'Fail'}")

📊 Output & Interpretation

Prediction for 3.5 hours studied: Pass

As the number of hours increases, so does the probability of passing. At 3.5 hours, the model predicts the student is likely to pass — just over the 0.5 threshold.


🌐 Real-World Applications

DomainUse Case
HealthcareDisease diagnosis (sick or healthy)
FinanceLoan approval (yes or no)
MarketingEmail open prediction (open or not)
HREmployee retention (stay or leave)

🧩 Final Thoughts

Logistic Regression is a powerful yet simple classification algorithm that is perfect for beginners. It’s interpretable, fast, and great for binary outcomes. Once you master it, you’ll be ready to explore more advanced classifiers like Decision Trees, Random Forests, and Neural Networks.


✉️ Subscribe to my blog!

Stay tuned for more hands-on ML tutorials and projects. 🚀 Follow me on Hash node and let's grow together in the world of Artificial Intelligence.

0
Subscribe to my newsletter

Read articles from Tilak Savani directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Tilak Savani
Tilak Savani