1. Short Introduction on ROC Curve

In the world of machine learning, especially in binary classification tasks, it's important not just to know how accurate a model is, but how well it distinguishes between classes — like spam vs. not spam, or fraud vs. legitimate transactions.

That’s where the ROC Curve (Receiver Operating Characteristic Curve) comes in.

The ROC curve is a graphical representation of a classifier’s performance across all possible classification thresholds. It plots:

True Positive Rate (TPR) – also known as recall or sensitivity – on the Y-axis
False Positive Rate (FPR) – the rate of false alarms – on the X-axis

As you move the decision threshold from 0 to 1, you trace out a curve that shows how the balance between correctly identifying positives and mistakenly labeling negatives changes.

A ROC curve that hugs the top-left corner represents a strong classifier. One that lies along the diagonal indicates performance no better than random guessing.

In short, the ROC curve helps us visualize the trade-off between sensitivity and false alarms, making it an essential tool for evaluating and comparing models.

2. The Obesity Story: A Classification Experiment

Let’s imagine we’re building a machine learning model to classify whether a person is obese or not obese based on their weight. Sounds simple, right? Heavier people are obese, lighter people are not.

But in real life, it’s not that straightforward.

Some people have high weight but lots of muscle mass -not obese.
Some people have low weight but high fat percentage - actually obese.

The Goal

Build a binary classifier using logistic regression that takes weight as input and predicts:

1 → Obese
0 → Not obese

We'll then analyze the model using a ROC curve to see how well it separates the two classes.

Note that ROC and AUC concepts are not just applicable for Logistic Regression , they can be applied to a lot of other models .

3. The Logistic Regression Predictor Model

In logistic regression, we use the sigmoid activation function to convert input features into probability values — values between 0 and 1 that indicate the likelihood of a data point belonging to a particular class (e.g., obese or not obese).

However, to make a final classification (yes or no), we need to compare these probabilities against a decision threshold.
By default, this threshold is often set at 0.5:

If the predicted probability > 0.5 → classify as positive (e.g., obese),
Else → classify as negative (not obese).

Say , our model looks something like this :

But here’s the critical question:
How do we choose the most appropriate decision threshold?

Choosing the wrong threshold could lead to too many false positives or false negatives — especially in real-world cases where data isn’t perfectly separable.

To explore this, let’s consider the above model that has already been trained on weight data. We'll feed some test data points into it and observe how the model classifies them based on different thresholds.

This will help us understand how ROC curves come into play when evaluating model performance across all possible thresholds, rather than just one.

Let us see based on our assumption of data distribution in Heading-2 , how the problems arise .

Based on the Data points we can see that the model is not being able to predict all the data points correctly . This will be more clear when we analyze further and create the confusion matrix .

This was the result when we took Decision Threshold as 0.5 which was a wild guess .

For better results , we will have to develop a rationale behind selection of the Threshold .

This is where ROC curve comes into picture .

4. The plotting of ROC Curve .

Now, let us set the decision threshold to 0.1. This means we are being extremely cautious and want to maximize the detection of all obese individuals — it's crucial that we don't miss anyone who is actually obese.

However, this comes at a cost:
By lowering the threshold so much, we also end up classifying many non-obese individuals as obese, leading to a high false positive rate.

Thus , by lowering the Threshold , False Negatives = 0.

Since , there are still misclassifications present , let us make the decision threshold to 0.9.

Thus , by increasing the Threshold , False Positives = 0.

But this method of analyzing Confusion matrices is very overwhelming , especially when the datasets will be large .

So , we resort to a better method of analysis called the ROC Curve .

To plot the ROC curve, we vary the decision threshold from 1.0 to 0.0 and compute the following metrics at each step:

1. True Positive Rate (TPR)

Also known as Recall or Sensitivity:

TPR = TP / (TP + FN)

TP = True Positives (correctly predicted positives)
FN = False Negatives (actual positives predicted as negative)

This becomes the Y-axis value on the ROC curve.

2. False Positive Rate (FPR)

FPR = FP / (FP + TN)

FP = False Positives (actual negatives predicted as positive)
TN = True Negatives (correctly predicted negatives)

This becomes the X-axis value on the ROC curve.

Each row of this table gives a point on the ROC curve:

(FPR,TPR)

By connecting these points as the threshold changes, we get the full ROC curve.

An example ROC Curve is as follows .

When looking at a ROC curve, you’re essentially seeing how the model performs across all classification thresholds. Each point on the curve represents a different threshold, showing the trade-off between:

True Positive Rate (TPR) — how many actual positives are correctly identified
False Positive Rate (FPR) — how many actual negatives are wrongly labeled as positive

5.Analysis of the ROC Curve

Visual Analysis

The top-left corner of the ROC space is the ideal point — it means:
- High TPR (most positives are caught)
- Low FPR (few negatives are misclassified)
A point near the diagonal line (y = x) indicates performance close to random guessing.

So visually, the closer the curve hugs the top-left, the better your model is at distinguishing between classes.

Judging the Trade-Off

Choosing a good threshold means finding the right balance between:

Catching more true positives (higher TPR)
Avoiding false alarms (lower FPR)

This is where application context comes in:

In medical diagnoses, missing a positive case can be dangerous → prioritize high TPR, even if FPR is higher.
In spam detection, too many false positives (mislabeling good emails) is annoying → prefer lower FPR, even if TPR drops a bit.

So, from the ROC curve:

If you're okay with more false positives to catch all positives, move rightward on the curve.
If you want fewer false positives and are okay missing some positives, stay more toward the left.

6. Area Under Curve (AUC)

AUC stands for Area Under the ROC Curve. It gives a single number that summarizes the performance of a classification model across all thresholds.

AUC = 1.0 → Perfect model (always ranks positives above negatives)
AUC = 0.5 → No better than random guessing
AUC < 0.5 → Worse than random (model might be inverted)

Intuition:

AUC tells you the likelihood that the model will rank a randomly chosen positive instance higher than a randomly chosen negative one.

In short, the higher the AUC, the better the model is at distinguishing between the two classes — regardless of the threshold.

7.Use of AUC in comparing multiple models :

When you have multiple models and you want to know which one performs better, the AUC score is a powerful tool because:

1. Threshold-Independent

AUC evaluates how well a model ranks predictions across all possible thresholds — not just at one fixed cutoff (like 0.5).
This gives a more complete picture of model performance.

2. Handles Class Imbalance Well

In datasets where one class dominates (e.g., 95% non-obese, 5% obese), metrics like accuracy can be misleading.
AUC focuses on how well the model separates the two classes, regardless of their proportions.

3. Easy Comparison

You can directly compare AUC values:

Let Model A: AUC = 0.87
And Model B: AUC = 0.79

→ Model A is better at distinguishing between the classes.

8.Final Thoughts

Understanding ROC curves and AUC is essential for evaluating the performance of classification models - especially in real-world scenarios where data is messy, thresholds matter, and trade-offs are inevitable.

While accuracy may give a quick snapshot, ROC and AUC provide a deeper, more reliable look into how well your model separates classes across different thresholds.

Whether you're working with medical diagnoses, fraud detection, or email spam filters, ROC and AUC help you move beyond simple correctness into meaningful decision-making.

Thanks for reading!
If you found this article helpful, feel free to share it, leave a comment, or connect to discuss more about machine learning evaluation metrics.

ROC and AUC Fundamentals: A Simple Guide for Beginners

Table of contents