AUC-ROC Curve in Machine Learning.


In Machine learning, evaluating the performance of a model is as important as building it. ROC-AUC curve (Works well for binary classification) is a performance matric which provides a comprehensive picture of a model’s performance.
What is ROC?
ROC stands for Receiver Operating Characteristic curve. The ROC (Receiver Operating Characteristic) curve is a graphical tool used to evaluate the performance of a binary classification model. It plots the True Positive Rate (TPR) on the Y-axis against the False Positive Rate (FPR) on the X-axis at various threshold settings. Each point on the ROC curve corresponds to a different threshold. The closer the curve is to the top-left corner, the better the model is at distinguishing between the two classes.
TPR= TP/(TP+FN)
FPR= FP/(FP+TN)
Common terms used in AUC-ROC curve:
AUC : It tells us how well a machine learning model can tell the difference between two classes (like "yes" vs. "no", or "spam" vs. "not spam").
ROC: The ROC is a graph that helps you understand how good your classification model is at separating two classes (like yes/no, true/false, spam/not spam). It shows trade off between TPR and FPR.
TPR: It tells us the proportion of positive class that got correctly classified. for example: What proportion of spam emails were correctly classified as spam by the model.
FPR: It tells us what proportion of actual negative instances were incorrectly classified as positive. for example: how many non-spam (normal) emails were wrongly marked as spam.
TNR: It tells us the portion of negative class that got correctly classified. for example: how many normal emails were correctly identified as not spam.
Each point on the ROC curve represents a different threshold for classifying a prediction as positive.
Confusion matrix:
A confusion matrix is like a scorecard for your model. It shows how well the model is doing by comparing its predictions with the actual results.
From confusion matrix you can determine the following:
Accuracy = (TP + TN) / Total
Precision = TP / (TP + FP)
Recall (TPR) = TP / (TP + FN)
F1 Score = Harmonic mean of precision and recall
Understanding ROC-AUC curve with an example:
Consider a model for email spam classification, the model assigns a probability score to each email, and based on a chosen threshold, it decides whether to label it as spam or not. If we set a low threshold, the model flags more emails as spam — this increases the True Positive Rate (TPR) but also raises the False Positive Rate (FPR), as more non-spam emails get wrongly marked. A high threshold reduces both TPR and FPR. The ROC curve visualizes this trade-off by plotting TPR against FPR for different thresholds. A curve closer to the top-left indicates better performance. The AUC (Area Under the Curve) summarizes this performance: AUC = 1 means perfect classification, AUC = 0.5 means random guessing, and anything below 0.5 is worse than random. By comparing AUC values, we can decide which model is more effective.
Subscribe to my newsletter
Read articles from aditi mishra directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
