What are the types of performance metrics for Supervised and Unsupervised Machine learning ?
Supervised Learning Metrics
Supervised learning involves training a model on a labelled dataset where the outcome is known. The goal is to predict the outcome for new, unseen data. The performance metrics for supervised learning are divided into two categories based on the type of prediction task: regression and classification.
1. Regression Metrics
These metrics evaluate the accuracy of models that predict continuous outcomes.
Mean Absolute Error (MAE): The average absolute difference between the predicted values and the actual values. It gives an idea of how wrong the predictions were, in the same units as the response variable.
- Example: Predicting house prices where the average error between the predicted price and the actual price is calculated.
Mean Squared Error (MSE): The average of the squares of the errors between the predicted and actual values. It emphasizes larger errors more than MAE due to squaring each term.
- Example: Predicting electricity consumption where larger errors are more critical and thus have a disproportionately large effect on MSE.
Root Mean Squared Error (RMSE): The square root of MSE, which also is in the same units as the response variable and similarly emphasizes larger errors.
- Example: Forecasting sales figures for retail, where it’s useful to consider the magnitude of error directly in terms of sales units.
R-squared (Coefficient of Determination): Measures the proportion of the variance in the dependent variable that is predictable from the independent variables, scaled from 0 to 1.
- Example: In real estate, determining how well the features like size, location, and age predict the price of a house.
2. Classification Metrics
These metrics are used to assess models that predict categorical outcomes.
Accuracy: The ratio of correctly predicted observations to the total observations. Best used when the classes are balanced.
- Example: Diagnosing patients with a disease as positive or negative where both classes are equally important.
Precision and Recall:
Precision: The ratio of correctly predicted positive observations to the total predicted positives. High precision relates to a low false positive rate.
Recall (Sensitivity): The ratio of correctly predicted positive observations to all observations in the actual class.
- Example: In spam detection, precision would be how many emails marked as spam are actually spam, and recall is how many actual spam emails were correctly identified.
F1 Score: The weighted average of Precision and Recall. Useful when you need a balance between Precision and Recall and there is an uneven class distribution (large number of actual negatives).
- Example: In customer churn prediction, balancing the identification of actual churners vs. ensuring not to predict too many false churners.
ROC-AUC Score: The area under the receiver operating characteristic (ROC) curve. It is used to measure the effectiveness of a classification model at all classification thresholds.
- Example: Evaluating credit scoring models where distinguishing between good and bad creditors over various threshold settings is crucial.
Unsupervised Learning Metrics
Unsupervised learning involves modelling the underlying structure or distribution in the data without using explicitly-provided labels. Metrics here are often about measuring the quality of clustering or the density of data.
1. Silhouette Score
Measures how similar an object is to its own cluster compared to other clusters. The value ranges from -1 to +1, where a high value indicates that the object is well matched to its own cluster and poorly matched to neighbouring clusters.
- Example: Customer segmentation where each segment or cluster should ideally contain customers with similar traits distinctly different from those in other clusters.
2. Davies-Bouldin Index
A function of the ratio of the sum of within-cluster scatter to between-cluster separation. The lower the score, the better the separation.
- Example: In market research, ensuring that identified market segments are distinct in terms of consumer behavior.
3. Calinski-Harabasz Index
The ratio of the sum of between-clusters dispersion to within-cluster dispersion. Higher scores indicate better defined clusters.
- Example: In document clustering, evaluating how well documents are grouped into topics.
Subscribe to my newsletter
Read articles from Sai Prasanna Maharana directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by