Choosing the Right Metrics: Recall, Precision, PR Curve and ROC Curve Explained

Accurate evaluation of machine learning models is crucial for their success.

Imagine you're a doctor trying to diagnose a rare disease. You want to catch as many cases as possible (high recall) while avoiding misdiagnosing healthy people (high precision).

This is where recall, precision, and the PR and ROC curves come into play. But how do we measure and balance these metrics for optimal performance?

This article dives deep into recall, precision, PR curve, and ROC curve—essential tools for evaluating the accuracy of classification models.

Let's dive into it right now!

Understanding Recall and Precision

Recall and precision are two fundamental metrics in binary classification problems.

In scenarios where the cost of a false negative is high, such as in medical diagnostics, recall becomes a critical measure.

On the other hand, in situations where false positives carry severe consequences, such as in spam detection systems, precision is of utmost importance.

Recall

Recall, also known as sensitivity or true positive rate (TPR), is the proportion of true positives that were correctly identified by the model.

It measures the model's ability to catch all positive instances. A high recall indicates that the model captures most of the actual positive cases, reducing the risk of missing important instances.

Mathematically, recall is calculated as:

Recall = True Positives / (True Positives + False Negatives)

For example, if there were 100 people with a disease and the test correctly identified 80 of them, the recall would be 0.8.

Precision

Precision, on the other hand, is the proportion of positive predictions that were correct.

It measures the model's accuracy in its positive predictions. A high precision means that when the model predicts a positive instance, it is highly likely to be correct.

Precision is calculated as:

Precision = True Positives / (True Positives + False Positives)

If the test predicted that 50 people had the disease, but only 30 of them actually did, the precision would be 0.6.

Computing Recall and Precision

Let's see how we can compute recall and precision using the scikit-learn library in Python:

from sklearn.metrics import precision_score, recall_score

# Assume you have the true labels (y_true) and predicted labels (y_pred)
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]
y_pred = [1, 1, 1, 1, 0, 0, 0, 1, 1, 0]

# Calculate precision
precision = precision_score(y_true, y_pred)
print(f"Precision: {precision:.2f}")

# Calculate recall
recall = recall_score(y_true, y_pred)
print(f"Recall: {recall:.2f}")

Output:

Precision: 0.71
Recall: 0.83

The Precision-Recall (PR) Curve

The PR curve is a powerful tool that plots the relationship between precision and recall across all possible thresholds. It provides a comprehensive view of a model's performance, highlighting the trade-offs between precision and recall.

Understanding the PR Curve

In a PR curve, precision is plotted on the y-axis, and recall is plotted on the x-axis. Each point on the curve represents a different threshold value. As the threshold varies, the balance between precision and recall changes:

  • High Precision and Low Recall: This indicates that the model is very accurate in its positive predictions but fails to capture a significant number of actual positive cases.

  • Low Precision and High Recall: This suggests that the model captures most of the positive cases but at the expense of making more false positive errors.

The ideal scenario is to have a curve that is as close to the top-right corner as possible, indicating high precision and high recall simultaneously.

Computing the PR Curve

To compute the PR curve, we need the true labels and the predicted probabilities for the positive class. Here's an example using scikit-learn:

from sklearn.metrics import precision_recall_curve

# Train a Logistic Regression classifier
# model = LogisticRegression()
# model.fit(X_train, y_train)
# # Predict probabilities for the test set
# y_scores = model.predict_proba(X_test)[:, 1]  # Get probabilities for the positive class
# Assume you have the true labels (y_true) and predicted probabilities (y_scores)
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]
y_scores = [0.8, 0.6, 0.9, 0.7, 0.4, 0.6, 0.3, 0.5, 0.8, 0.2]

# Compute precision-recall curve
# precision is an array of precision values at different thresholds.
# recall is an array of recall values at different thresholds.
# thresholds is an array of threshold values used to compute precision and recall.
precision, recall, thresholds = precision_recall_curve(y_true, y_scores)

In this code snippet, y_true represents the true labels, and y_scores represents the predicted probabilities for the positive class. The precision_recall_curve function returns three arrays:

Plotting the PR Curve

To visualize the PR curve, we can use matplotlib:

import matplotlib.pyplot as plt

# Plot precision-recall curve
plt.figure(figsize=(8, 6))
plt.plot(recall, precision, marker='.')
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curve')
plt.legend(loc='lower left')
plt.grid(True)
plt.show()

This code will generate a plot of the PR curve, with precision on the y-axis and recall on the x-axis.

Threshold Selection using the PR Curve

The PR curve can be used to select an appropriate threshold for making predictions. By examining the curve, you can find the point where precision begins to drop significantly and set the threshold just before this drop.

This allows you to balance both precision and recall effectively. Once the threshold is identified, predictions can be made by checking whether the model's score for each instance is greater than or equal to this threshold.

PR-AUC: Area Under the PR Curve

The PR-AUC (Area Under the PR Curve) is a summary metric that captures the model's performance across all thresholds.

It provides a single value to evaluate the model's overall performance, considering all possible thresholds.

A perfect classifier has a PR-AUC of 1.0, indicating perfect precision and recall at all thresholds.

On the other hand, a random classifier has a PR-AUC equal to the proportion of positive labels in the dataset, indicating no better than chance performance.

A high PR-AUC indicates a model that balances precision and recall well, while a low PR-AUC suggests room for improvement.

The Receiver Operating Characteristic (ROC) Curve

The ROC curve is another popular tool for evaluating binary classification models. It plots the True Positive Rate (TPR) against the False Positive Rate (FPR) at various threshold settings.

Understanding the ROC Curve

The ROC curve provides a visual representation of the trade-off between the benefits (true positives) and costs (false positives) of a classifier.

The goal is to shift the curve towards the top-left corner of the plot, indicating a higher rate of true positives and a lower rate of false positives.

True Positive Rate (TPR):

  • Also known as recall or sensitivity, this is the ratio of positive instances that are correctly identified by the classifier

  • Ratio of positive instances correctly classified as positive

True Negative Rate (TNR):

  • This measures the proportion of actual negative instances that are correctly classified by the model.

  • Also called Specificity

  • Ratio of negative instances correctly classified as negative

False Positive Rate (FPR):

  • This is the ratio of negative instances that are incorrectly classified as positive. It complements the True Negative Rate (TNR), which measures the proportion of negatives correctly identified as such.

  • Equal to 1 - True Negative Rate (TNR)

Computing the ROC Curve

To compute the ROC curve, we need the true labels and the predicted probabilities for the positive class. Here's an example using scikit-learn:

from sklearn.metrics import roc_curve, roc_auc_score

## Predict probabilities for the test set
# y_scores = model.predict_proba(X_test)[:, 1]  # probabilities for the positive class
# Assume you have the true labels (y_true) and predicted probabilities (y_scores)
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]
y_scores = [0.8, 0.6, 0.9, 0.7, 0.4, 0.6, 0.3, 0.5, 0.8, 0.2]

# Compute ROC curve and AUC score
fpr, tpr, thresholds = roc_curve(y_true, y_scores)
roc_auc = roc_auc_score(y_true, y_scores)

The roc_curve function returns three arrays:

  • fpr: An array of false positive rates at different thresholds.

  • tpr: An array of true positive rates at different thresholds.

  • thresholds: An array of threshold values used to compute FPR and TPR.

The roc_auc_score function computes the Area Under the ROC Curve (AUC-ROC), which we'll discuss later.

Plotting the ROC Curve

To visualize the ROC curve, we can use matplotlib:

import matplotlib.pyplot as plt

# Plot ROC curve
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='blue', lw=2, label=f'ROC curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='red', lw=2, linestyle='--', label='Random Guess')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc="lower right")
plt.grid(True)
plt.show()

This code will generate a plot of the ROC curve, with the False Positive Rate on the x-axis and the True Positive Rate on the y-axis.

The diagonal dashed line represents the performance of a random classifier.

ROC-AUC: Area Under the ROC Curve

The ROC-AUC is a single scalar value that summarizes the overall ability of the model to discriminate between the positive and negative classes over all possible thresholds.

Curve Analysis:

  • A curve closer to the top-left corner indicates a high sensitivity and specificity, meaning the model is effective in classifying both classes correctly.

    • A higher curve indicates better performance, with the ideal point being in the top left corner of the plot (high TPR, low FPR).
  • A curve near the diagonal line (from bottom-left to top-right) indicates that the classifier is performing no better than random guessing.

It ranges from 0.0 to 1.0:

  • 0.5: This indicates a model with no discriminative ability, equivalent to random guessing.

  • 1.0: This represents a perfect model that correctly classifies all positive and negative instances.

  • < 0.5: This suggests a model that performs worse than random chance, often indicating serious issues in model training or data handling.

The ROC-AUC is particularly useful in scenarios where the class distribution is imbalanced, as it is not affected by the proportion of positive and negative instances.

Advantages of ROC-AUC

Key benefits are:

  • Robust to Class Imbalance: Unlike accuracy, ROC-AUC is not influenced by the number of cases in each class, making it suitable for imbalanced datasets.

  • Threshold Independence: It evaluates the model's performance across all possible thresholds, providing a comprehensive measure of its effectiveness.

  • Scale Invariance: The ROC-AUC is not affected by the scale of the scores or probabilities generated by the model, assessing performance based on the ranking of predictions.

Threshold Selection using the ROC Curve

The ROC curve can be used to select an appropriate threshold for making predictions. Lowering the threshold means the model starts classifying more instances as positive, increasing recall but potentially decreasing precision.

The trade-off between precision and recall needs to be managed carefully based on the application's tolerance for false positives.

The point where the precision and recall curves cross might be considered an optimal balance, especially when false positives and false negatives carry similar costs.

Practical Applications of ROC-AUC

The ROC curve is widely used in domains where it is crucial to examine how well a model can discriminate between classes under varying threshold scenarios.

Some common applications include:

  • Medical Diagnostics: Assessing the performance of diagnostic tests in correctly identifying diseases.

  • Fraud Detection: Evaluating the effectiveness of fraud detection models in identifying fraudulent transactions.

  • Information Retrieval: Measuring the ability of search engines to retrieve relevant documents.

By analyzing the ROC curve, decision-makers can select the threshold that best balances sensitivity and specificity for their specific context, often driven by the relative costs of false positives versus false negatives.

PR Curve vs. ROC Curve: When to Use Which?

While the PR curve and ROC curve are similar, they serve different purposes. The choice between them depends on the specific problem and goals:

When to Use the PR Curve

  • Imbalanced Datasets: When the positive class is rare, and the dataset is heavily imbalanced, the PR curve is more informative than the ROC curve. Examples include fraud detection and disease diagnosis.

  • Costly False Positives: If false positives are more costly or significant than false negatives, such as in spam email detection, the PR curve is more suitable as it focuses on precision.

When to Use the ROC Curve

  • More Balanced Datasets: When the dataset is more balanced or when equal emphasis is placed on the performance regarding both false positives and false negatives, the ROC curve is preferred.

The rationale behind this rule of thumb is that in imbalanced datasets with rare positive instances, the ROC curve can be misleading, showing high performance even if the model performs poorly on the minority class.

In such cases, the PR curve provides a more accurate representation of the model's performance.

Conclusion

Recall, precision, and the PR and ROC curves are essential tools for evaluating binary classification models. By understanding these metrics and their computation, you can gain valuable insights into your model's performance and make informed decisions.

Remember, the choice between the PR curve and ROC curve depends on the nature of your dataset and the specific goals of your problem.

The PR curve is more suitable for imbalanced datasets or when false positives are more costly, while the ROC curve is preferred for more balanced datasets or when equal emphasis is placed on false positives and false negatives.

By leveraging these powerful metrics and visualizations, you can assess your classification models comprehensively, select appropriate thresholds, and optimize performance based on your specific requirements.

Whether you're a data scientist, researcher, or machine learning practitioner, mastering recall, precision, PR curve, and ROC curve will empower you to make data-driven decisions and build highly effective classification models.

If you like this article, share it with others ♻️

Would help a lot ❤️

And feel free to follow me for articles more like this.

1
Subscribe to my newsletter

Read articles from Juan Carlos Olamendy directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Juan Carlos Olamendy
Juan Carlos Olamendy

🤖 Talk about AI/ML · AI-preneur 🛠️ Build AI tools 🚀 Share my journey 𓀙 🔗 http://pixela.io