Logistic Regression Explained for Beginners

Imagine this:

You’re sitting across from your doctor. He glances at your chart, then looks up and says,

You’re trying to predict life and death with a straight line?

You’ve just been diagnosed with cancer. The decisions ahead aren’t vague guesses. They rely on measurable variables — tumor size, patient age, treatment response.

But here’s the twist: predicting survival isn’t about hard yes-or-no answers. It’s about how likely survival is as each factor shifts.

A simple yes or no can’t capture that nuance. You need something that bends with reality — a tool that shows, as treatment becomes more effective, the probability of survival gently rises.

That’s what logistic regression does. It doesn’t force the data into a line — it curves toward the truth, mapping uncertainty into meaningful probability.

Not just for patients, but for every binary decision that needs clarity backed by data.

Origin Story

What do sea urchins, war wounds, and email spam have in common?

They all helped shape logistic regression.

In 1845, Pierre Verhulst noticed that populations don’t grow forever. He modeled it with an S-shaped curve — now called the sigmoid function.

A century later, doctors and engineers used this curve to make yes/no predictions — like whether a patient survives or a machine fails.

Then came spam filters.

The same curve now helps sort your inbox. From sea creatures to spam detection — not bad for a 19th-century idea.

Why Is It Called “Logistic Regression” if It Classifies?

Some might ask this question and the answer is yes, logistic regression is used for classification. But it’s still called regression — because the model predicts a number, not a class. That number? It’s a probability. And it starts with this familiar equation:

z = β₀ + β₁x

This is just a straight line. It might output:

4.2, -1.7, 9.9

But since probabilities can’t be less than 0 or greater than 1, we need a function that keeps predictions within that range. That’s where the sigmoid function comes in.

Enter the Sigmoid Function

Here’s what happens, we pass the linear output z into the sigmoid function, which transforms it into a number between 0 and 1.

$$\text{Sigmoid}(z) = \frac{1}{1 + e^{-z}}$$

This squashing turns -∞ to +∞ into the range (0, 1).

But more importantly — it creates a smooth curve of confidence.

A value near 0.5? The model’s unsure.
Close to 1? It’s pretty confident this is Class 1.
Close to 0? It’s leaning toward Class 0.

What Logistic Regression Really Predicts: Log-Odds

Logistic regression doesn’t just predict a probability out of nowhere.

It models the log-odds of the event happening:

$$\log\left(\frac{p}{1 - p}\right) = β₀ + β₁x$$

Why log-odds? Because odds are linear in the real world.

A 10x increase in salary might increase the odds of buying a house linearly — not the probability directly.

Taking the log keeps the math nice and symmetrical, and ensures the output covers the full range of real numbers (good for regression).

Then we just invert the log-odds back into a probability using sigmoid.

Think of it like this:

Logistic regression speaks in log-odds.

The sigmoid acts as its translator.

You, the human, interpret it as: “There’s a 92% chance this email is spam.”

Why Mean Squared Error Fails — And Cross-Entropy Wins

From the last article on regression, you might think that since logistic regression also has “regression” in its name, it uses mean squared error (MSE) to measure accuracy.

But it doesn’t.

In fact, many beginners assume that all machine learning models use MSE to train — because it’s common in linear regression.

That’s a trap.

It's fine for regression but here’s why it fails in classification: MSE treats all errors equally. It doesn’t care if the model is confidently wrong.

Cross-entropy however punishes the model much more for saying:

“I’m 99% sure it’s Class 1” when the correct answer is Class 0.

$$\text{Loss} = -\left[y \log(p) + (1 - y) \log(1 - p)\right]$$

This loss function makes your model more cautious, more calibrated, and more accountable for its guesses.

We Use Gradient Descent to Learn

Like linear regression, we use Gradient Descent to adjust the weights and bias to minimize the loss over time.

Let’s bring this to life with code.

Logistic Regression using Gradient Descent (With Code + Visualization)

Here’s a basic implementation of logistic regression from scratch in Python:

import numpy as np
import matplotlib.pyplot as plt

# Step 1: Data
X = np.array([1, 2, 3, 4, 5])
y = np.array([0, 0, 0, 1, 1])

# Step 2: Sigmoid
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# Step 3: Initialize
w, b = 0.0, 0.0
lr, epochs, n = 0.1, 1000, len(X)

# Step 4: Train
for epoch in range(epochs):
    z = w * X + b
    y_pred = sigmoid(z)
    dw = (1/n) * np.dot(X, (y_pred - y))
    db = (1/n) * np.sum(y_pred - y)
    w -= lr * dw
    b -= lr * db
    if epoch % 100 == 0:
       loss = - (1/n) * np.sum(y * np.log(y_pred) + (1 - y) * np.log(1 - y_pred))
       print(f"Epoch {epoch}: Loss={loss:.4f}, w={w:.4f}, b={b:.4f}")
# Final Model
print(f"\nFinal w: {w:.4f}, Final b: {b:.4f}")

# Plot
plt.scatter(X, y, color='blue')
x_line = np.linspace(0, 6, 100)
y_line = sigmoid(w * x_line + b)
plt.plot(x_line, y_line, color='red')
plt.title('Logistic Regression Curve')
plt.xlabel('Study Hours')
plt.ylabel('Probability of Passing')
plt.grid(True)
plt.show()

Logistic Regression: The Classifier That Thinks Like a Regression