AUC-ROC Curve in Machine Learning.

aditi mishraaditi mishra
3 min read

In Machine learning, evaluating the performance of a model is as important as building it. ROC-AUC curve (Works well for binary classification) is a performance matric which provides a comprehensive picture of a model’s performance.

What is ROC?

ROC stands for Receiver Operating Characteristic curve. The ROC (Receiver Operating Characteristic) curve is a graphical tool used to evaluate the performance of a binary classification model. It plots the True Positive Rate (TPR) on the Y-axis against the False Positive Rate (FPR) on the X-axis at various threshold settings. Each point on the ROC curve corresponds to a different threshold. The closer the curve is to the top-left corner, the better the model is at distinguishing between the two classes.

TPR= TP/(TP+FN)

FPR= FP/(FP+TN)

Common terms used in AUC-ROC curve:

  1. AUC : It tells us how well a machine learning model can tell the difference between two classes (like "yes" vs. "no", or "spam" vs. "not spam").

  2. ROC: The ROC is a graph that helps you understand how good your classification model is at separating two classes (like yes/no, true/false, spam/not spam). It shows trade off between TPR and FPR.

  3. TPR: It tells us the proportion of positive class that got correctly classified. for example: What proportion of spam emails were correctly classified as spam by the model.

  4. FPR: It tells us what proportion of actual negative instances were incorrectly classified as positive. for example: how many non-spam (normal) emails were wrongly marked as spam.

  5. TNR: It tells us the portion of negative class that got correctly classified. for example: how many normal emails were correctly identified as not spam.

Each point on the ROC curve represents a different threshold for classifying a prediction as positive.

Confusion matrix:

A confusion matrix is like a scorecard for your model. It shows how well the model is doing by comparing its predictions with the actual results.

From confusion matrix you can determine the following:

  • Accuracy = (TP + TN) / Total

  • Precision = TP / (TP + FP)

  • Recall (TPR) = TP / (TP + FN)

  • F1 Score = Harmonic mean of precision and recall

Understanding ROC-AUC curve with an example:

Consider a model for email spam classification, the model assigns a probability score to each email, and based on a chosen threshold, it decides whether to label it as spam or not. If we set a low threshold, the model flags more emails as spam — this increases the True Positive Rate (TPR) but also raises the False Positive Rate (FPR), as more non-spam emails get wrongly marked. A high threshold reduces both TPR and FPR. The ROC curve visualizes this trade-off by plotting TPR against FPR for different thresholds. A curve closer to the top-left indicates better performance. The AUC (Area Under the Curve) summarizes this performance: AUC = 1 means perfect classification, AUC = 0.5 means random guessing, and anything below 0.5 is worse than random. By comparing AUC values, we can decide which model is more effective.

0
Subscribe to my newsletter

Read articles from aditi mishra directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

aditi mishra
aditi mishra