k-NN Classification and Model Evaluation

mortylenmortylen
17 min read

In this article, I focus on selecting evaluation metrics such as Accuracy, Precision, Recall, and F1-Score, and I will try to explain in which situations each of them is appropriate to use. We will also see how to implement k-NN in the Rust programming language. This article is intended for readers who want to understand how to evaluate models for simple classification in machine learning, and I will also outline how the k-NN algorithm works. I chose the k-NN algorithm for its simplicity and ease of understanding.

Introduction to k-NN Classification

The goal of classification algorithms is to predict the category or class to which a given object belongs, based on historical data.

A typical example is predicting whether an email is spam or not, whether a credit card is fraudulent, or whether a programming project will be successful based on various factors such as the programmer’s experience, technological difficulty, or the number of cups of coffee they drink throughout the day.

k-NN (k-Nearest Neighbors) Algorithm

One of the simplest and most intuitive classification algorithms is k-NN (k-Nearest Neighbors), which belongs to the family of supervised learning algorithms. This algorithm is based on the idea that objects (or data points) within the same or similar categories will be closer to each other in space.

k-NN works by, for each new unlabeled object (e.g., a project), finding the "k" nearest neighbors in the training set, and based on their category (e.g., successful/unsuccessful project), it assigns the new object to the same category. The advantage is the simplicity of implementation and intuitive understanding. A disadvantage can be the higher computational cost with large datasets.

Data Preparation for k-NN Classification

Before we start training the model, it’s crucial to prepare the data. The k-NN algorithm is very sensitive to the quality and format of the input data, so it’s essential to ensure that the data is correctly prepared for analysis. I will briefly mention the most important data processing steps, which you can use as a simple checklist.

Handling Missing Data

If the dataset has missing values in some attributes, we must decide whether to fill in these values, remove them, or replace them. In most cases, we can replace missing values with the mean (imputation) or remove them if the number of missing values is negligible.

Categorizing Data

If we have categorical data (e.g., the category "experience level": beginner, intermediate, expert), we need to encode this data into numerical values. For k-NN, numerical values are required, so we could convert these categories into numbers (e.g., 1 for beginner, 2 for intermediate, and 3 for expert).

Outlier Detection and Removal

It’s important to check if any data contains extreme values that could skew the results. These outliers should either be corrected or removed.

Data Normalization/Scaling

k-NN works on the principle of calculating distances between points in the data space. If we have attributes with different ranges of values (e.g., experience from 1 to 10 and coffee from 1 to 100), some attributes may dominate the distance calculation. Therefore, it’s important to normalize or scale these values so that each attribute has an equal impact on the computation.

In the practical example, I have already prepared a clean dataset.

k-NN Algorithm Implementation

The k-Nearest Neighbors (k-NN) algorithm is simple yet very powerful for classification. Its basic idea is that an unlabeled point is classified based on the classes of its nearest neighbors in the data space.

The algorithm works as follows:

  1. For each point in the test set, calculate the distance to all points in the training set.

  2. Select the "k" nearest neighbors (using Euclidean distance or other metrics).

  3. Predict the class (e.g., successful/unsuccessful project) based on the majority class of the k nearest neighbors.

We can use various distance metrics for calculating the distance, such as Euclidean distance, Manhattan distance, or Minkowski distance.

In this article, I’ve chosen the 3 most commonly used distance measurement methods to compare and select the most appropriate one.

Code Example

Our task is to predict the success of a programming project based on three attributes:

  • Experience (the programmer’s experience).

  • Tech Difficulty (the technological difficulty).

  • Coffee (the number of cups of coffee the programmer drinks while working).

For each project in the test set, we will decide whether the project will be successful or unsuccessful based on how its attributes match with the attributes of projects in the training set. To do this, we’ll use the k-NN algorithm, which employs Euclidean distance to compute the similarity between projects.

I have divided the project into several files:

  • data.csv: Our dataset, which contains a list of projects with attributes (experience, tech difficulty, coffee, and success).

  • data.rs: Responsible for loading data from the CSV file.

  • distance.rs: Contains methods for distance calculation (Euclidean, Manhattan, and Minkowski distance).

  • metric.rs: Contains metric methods (accuracy, precision, recall, and F1-score).

  • knn: Handles the k-NN computation and prediction.

  • main.rs: Ties everything together and prints the results.

In this article, I focus only on the most important functions (from main.rs). The full project code can be found on GitHub: https://github.com/mortylen/ml-knn-metrics-rs

main.rs:

mod data;
mod distance;
mod knn;
mod metrics;

use crate::data::{load_projects_from_csv, Project};
use crate::distance::{euclidean_distance_weighted, manhattan_distance_weighted, minkowski_distance_weighted, Weights};
use crate::knn::{knn, DistanceFn};
use crate::metrics::{accuracy, precision, recall, f1_score};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Load training data from CSV
    let data = load_projects_from_csv("data/data.csv")?;

    // Split data into training and test sets - for simplicity, take the last 10 as test
    let (train_data, test_data) = data.split_at(data.len() - 10);
    let train_vec = train_data.to_vec();

    // Set "k" parameter
    let k = 3;

    // Set weights for features
    let weights = Weights {
        experience: 1.0,
        tech_difficulty: 1.0,
        coffee: 1.0,
    };

    let euclid_fn: DistanceFn = euclidean_distance_weighted;
    let manhattan_fn: DistanceFn = manhattan_distance_weighted;
    let minkowski_fn: DistanceFn = |a, b, w| minkowski_distance_weighted(a, b, w, 3.0);

    // Helper to predict labels for a test set
    fn predict_all(train: &Vec<Project>, test: &[Project], k: usize, dist_fn: DistanceFn, weights: &Weights) -> Vec<u32> {
        test.iter().map(|proj| knn(train, proj, k, dist_fn, weights)).collect()
    }

    let true_labels: Vec<u32> = test_data.iter().map(|p| p.success).collect();
    let predicted_labels_euclid = predict_all(&train_vec, test_data, k, euclid_fn, &weights);
    let predicted_labels_manhattan = predict_all(&train_vec, test_data, k, manhattan_fn, &weights);
    let predicted_labels_minkowski = predict_all(&train_vec, test_data, k, minkowski_fn, &weights);

    print_all_metrics(&true_labels, &predicted_labels_euclid, &predicted_labels_manhattan, &predicted_labels_minkowski);

    // Prediction for new input
    let new_project = Project { experience: 4, tech_difficulty: 2, coffee: 5, success: 0 }; // success is a placeholder
    println!(
        "\nPrediction for new project: experience={}, tech_difficulty={}, coffee={}",
        new_project.experience, new_project.tech_difficulty, new_project.coffee
    );
    print_single_prediction("Euclidean", knn(&train_vec, &new_project, k, euclid_fn, &weights));
    print_single_prediction("Manhattan", knn(&train_vec, &new_project, k, manhattan_fn, &weights));
    print_single_prediction("Minkowski (p=3)", knn(&train_vec, &new_project, k, minkowski_fn, &weights));

    Ok(())
}

fn print_metrics(true_labels: &[u32], predicted_labels: &[u32]) {
    let acc = accuracy(true_labels, predicted_labels);
    let prec = precision(true_labels, predicted_labels);
    let rec = recall(true_labels, predicted_labels);
    let f1 = f1_score(true_labels, predicted_labels);
    println!("Accuracy: {:.3}", acc);
    println!("Precision: {:.3}", prec);
    println!("Recall: {:.3}", rec);
    println!("F1-Score: {:.3}", f1);
}

fn print_all_metrics(true_labels: &[u32], euclid: &[u32], manhattan: &[u32], minkowski: &[u32]) {
    println!("-- Euclidean distance metrics --");
    print_metrics(true_labels, euclid);
    println!("\n-- Manhattan distance metrics --");
    print_metrics(true_labels, manhattan);
    println!("\n-- Minkowski distance (p=3) metrics --");
    print_metrics(true_labels, minkowski);
}

fn print_single_prediction(name: &str, pred: u32) {
    println!(
        "→ {} prediction: {} ({})",
        name,
        pred,
        if pred == 1 { "successful" } else { "unsuccessful" }
    );
}

Brief Explanation

  1. First, the dataset (data.csv) is loaded and split into training and test sets:

     // Load training data from CSV
     let data = load_projects_from_csv("data/data.csv")?;
    
     // Split data into training and test sets - for simplicity, take the last 10 as test
     let (train_data, test_data) = data.split_at(data.len() - 10);
     let train_vec = train_data.to_vec();
    
  2. Setting the "k" parameter and the weights for the features:

     // Set "k" parameter
     let k = 3;
    
     // Set weights for features
     let weights = Weights {
         experience: 1.0,
         tech_difficulty: 1.0,
         coffee: 1.0,
     };
    
  3. Defining the distance functions (Euclidean, Manhattan, and Minkowski distances):

     let euclid_fn: DistanceFn = euclidean_distance_weighted;
     let manhattan_fn: DistanceFn = manhattan_distance_weighted;
     let minkowski_fn: DistanceFn = |a, b, w| minkowski_distance_weighted(a, b, w, 3.0);
    
  4. Predicting the success of the projects in the test set using "k-NN":

     fn predict_all(train: &Vec<Project>, test: &[Project], k: usize, dist_fn: DistanceFn, weights: &Weights) -> Vec<u32> {
         test.iter().map(|proj| knn(train, proj, k, dist_fn, weights)).collect()
     }
    
  5. Evaluating performance (accuracy, precision, recall, and F1-score) and printing the results:

     print_all_metrics(&true_labels, &predicted_labels_euclid, &predicted_labels_manhattan, &predicted_labels_minkowski);
    
  6. Finally, predicting the outcome of a new project (experience: 4, tech_difficulty: 2, coffee: 5):

     let new_project = Project { experience: 4, tech_difficulty: 2, coffee: 5, success: 0 }; // success is a placeholder
     println!(
         "\nPrediction for new project: experience={}, tech_difficulty={}, coffee={}",
         new_project.experience, new_project.tech_difficulty, new_project.coffee
     );
     print_single_prediction("Euclidean", knn(&train_vec, &new_project, k, euclid_fn, &weights));
     print_single_prediction("Manhattan", knn(&train_vec, &new_project, k, manhattan_fn, &weights));
     print_single_prediction("Minkowski (p=3)", knn(&train_vec, &new_project, k, minkowski_fn, &weights));
    

Further experimentation is needed with different settings of the "k" parameter, adjusting the weights (for example, we could give more weight to the number of cups of coffee ☕, which certainly has a significant impact on project success 😂), adding more data to the dataset, and trying different splits of training and test data.

Output

After running the program, we will get a prediction for the success of the project based on its attributes (experience, technological difficulty, coffee). The program calculates the distances between the test project and all projects in the training set, selects the 3 nearest neighbors, and decides whether the project will be successful or unsuccessful.

-- Euclidean distance metrics --
Accuracy: 0.800
Precision: 0.800
Recall: 1.000
F1-Score: 0.889

-- Manhattan distance metrics --
Accuracy: 0.800
Precision: 0.800
Recall: 1.000
F1-Score: 0.889

-- Minkowski distance (p=3) metrics --
Accuracy: 0.800
Precision: 0.800
Recall: 1.000
F1-Score: 0.889

Prediction for new project: experience=4, tech_difficulty=2, coffee=5
→ Euclidean prediction: 1 (successful)
→ Manhattan prediction: 1 (successful)
→ Minkowski (p=3) prediction: 1 (successful)

Model Performance Evaluation

Every machine learning model, including our k-NN algorithm, can be evaluated based on how well it predicts outcomes on new data. Without evaluation, we wouldn’t be able to determine whether our model is actually good or if it needs improvement. k-NN is very sensitive to data quality, the choice of the "k" parameter, and the correct evaluation method. To this end, we use various evaluation metrics to help us understand how the model behaves in different situations.

Accuracy

Accuracy is the simplest metric that tells us the percentage of correct predictions made by the model. It is the most commonly used metric, especially when we have balanced data, meaning an equal number of cases for both classes.

$$\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}$$

  • TP: True Positives — correctly predicted positive instances.

  • TN: True Negatives — correctly predicted negative instances.

  • FP: False Positives — negative instances incorrectly predicted as positive.

  • FN: False Negatives — positive instances incorrectly predicted as negative.

Advantages:

  • Simple to understand and interpret.

  • Suitable for balanced datasets where the classes are equally represented.

Disadvantages:

  • Can be misleading when we have imbalanced data (e.g., when there are more negative cases than positive ones). If the number of negative cases is much higher than the positive ones, accuracy may show a high value even when the model never predicts the positive class. For example, if we have 95% negative cases, a model that always predicts the negative class might achieve 95% accuracy, even though it wouldn’t be useful.
pub fn accuracy(true_labels: &[u32], predicted_labels: &[u32]) -> f64 {
    let total = true_labels.len();
    let correct = true_labels.iter()
        .zip(predicted_labels.iter())
        .filter(|(t, p)| t == p)
        .count();

    correct as f64 / total as f64
}

Precision

Precision indicates how many of the positive predictions were actually correct. It is important when we want to minimize False Positives (incorrect positive predictions), such as in disease diagnosis, where we don’t want the model to incorrectly label a healthy person as sick.

$$\text{Precision} = \frac{TP}{TP + FP}$$

Advantages:

  • Useful for minimizing false positives. It can be very important to prevent the model from labeling a positive case (e.g., spam email or disease) as negative.

  • Suitable in cases where there are high costs associated with false positives, such as disease testing, where an incorrect diagnosis could lead to unnecessary treatment.

Disadvantages:

  • Does not account for false negatives. In situations where we also care about capturing all positive cases (e.g., disease detection), precision may not be enough.
pub fn precision(true_labels: &[u32], predicted_labels: &[u32]) -> f64 {
    let tp = true_labels.iter()
        .zip(predicted_labels.iter())
        .filter(|(t, p)| **t == 1 && **p == 1)
        .count() as f64;

    let fp = predicted_labels.iter()
        .zip(true_labels.iter())
        .filter(|(p, t)| **p == 1 && **t == 0)
        .count() as f64;

    if tp + fp == 0.0 {
        0.0
    } else {
        tp / (tp + fp)
    }
}

Recall

Recall, also known as Sensitivity or True Positive Rate, indicates how many of the actual positive cases were correctly identified. It is important when we want to minimize False Negatives (incorrectly missed positive predictions), such as in rare disease detection, where it is crucial not to miss any positive cases.

$$\text{Recall} = \frac{TP}{TP + FN}$$

Advantages:

  • Minimizes false negatives. For certain applications, it is important not to miss any positive case.

  • Suitable for cases where detecting as many positive cases as possible is important, even if it leads to a higher number of false positives.

Disadvantages:

  • Does not account for false positives, which can lead to the model predicting many incorrect positive cases.
pub fn recall(true_labels: &[u32], predicted_labels: &[u32]) -> f64 {
    let tp = true_labels.iter()
        .zip(predicted_labels.iter())
        .filter(|(t, p)| **t == 1 && **p == 1)
        .count() as f64;

    let fn_ = true_labels.iter()
        .zip(predicted_labels.iter())
        .filter(|(t, p)| **t == 1 && **p == 0)
        .count() as f64;

    if tp + fn_ == 0.0 {
        0.0
    } else {
        tp / (tp + fn_)
    }
}

F1-Score

The F1-Score is the harmonic mean between Precision and Recall. This metric is very useful when we do not want to favor one method over the other and need a balance between minimizing False Positives and False Negatives.

$$\text{F1} = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}$$

Advantages:

  • Balances Precision and Recall. The F1-Score is useful when we do not know whether it is more important to minimize False Positives or False Negatives.

  • Suitable for imbalanced data, where one type of error (False Positives or False Negatives) may have a greater impact.

Disadvantages:

  • Not always intuitive, as it’s not as straightforward to interpret as Accuracy or Precision.
pub fn f1_score(true_labels: &[u32], predicted_labels: &[u32]) -> f64 {
    let p = precision(true_labels, predicted_labels);
    let r = recall(true_labels, predicted_labels);

    if p + r == 0.0 {
        0.0
    } else {
        2.0 * (p * r) / (p + r)
    }
}

Which Metric is the Right One?

Choosing the right evaluation metric is crucial for effectively assessing model performance, especially in classification tasks. Accuracy, Precision, Recall, and F1-Score are just some of the many metrics used to evaluate a model. Each of these metrics has its advantages and disadvantages, and the right choice depends on the nature of the problem and the data available.

When selecting metrics, it’s important to consider what matters most to you: whether to minimize False Positives or False Negatives, or simply maximize accuracy.

Metric Comparison:

fn main() {
    let true_labels = vec![1, 0, 1, 1, 0, 1, 0, 0, 1, 0];
    let predicted_labels = vec![1, 0, 1, 0, 0, 1, 0, 1, 1, 0];

    let acc = accuracy(&true_labels, &predicted_labels);
    let prec = precision(&true_labels, &predicted_labels);
    let rec = recall(&true_labels, &predicted_labels);
    let f1 = f1_score(&true_labels, &predicted_labels);

    println!("Accuracy: {}", acc);
    println!("Precision: {}", prec);
    println!("Recall: {}", rec);
    println!("F1-Score: {}", f1);
}

Evaluating the model provides key insights into how it behaves in practice, helping us decide if it’s accurate enough for deployment in the real world.

Accuracy: Let’s imagine our model predicts whether a product is "cheap" or "expensive," and we have 100 products, with 50 labeled as "cheap" and 50 as "expensive." If our model correctly predicted 80 of these 100 products, its Accuracy would be:

$$Accuracy = \frac{80}{100} = 0.8 = 80\%$$

Precision: If we have a model predicting spam emails and we don’t want the model to wrongly label legitimate emails as spam, it’s crucial that the model has high Precision. If the model labeled 20 emails as spam, of which 18 were indeed spam, the Precision would be:

$$Precision = \frac{18}{20} = 0.9 = 90\%$$

Recall: Let’s imagine a model predicting whether a patient has a disease (e.g., cancer). If it’s crucial to capture all truly sick patients, even if some healthy patients are mistakenly labeled as sick (False Positives), we would prioritize Recall. If the model caught 15 out of 20 truly sick patients, the Recall would be:

$$Recall = \frac{15}{20} = 0.75 = 75\%$$

F1-Score: In the case where a model predicts whether a product is "cheap" or "expensive" and we care about balancing the accuracy and completeness of predictions (i.e., balancing Precision and Recall), we use the F1-Score. If the model achieved Precision = 0.8 and Recall = 0.6, the F1-Score would be:

$$F1 = 2 \times \frac{0.8 \times 0.6}{0.8 + 0.6} = 0.6857 = 68.57\%$$

In the case of our example with programmers, technical difficulty of projects, and coffee consumption, F1-Score would be the most suitable metric because:

  • The technical difficulty of projects and coffee consumption habits of programmers might be imbalanced (e.g., most projects could be "low difficulty," but we still need to capture the "high difficulty" ones).

  • Coffee is just one of the factors, and it’s not entirely certain that a high-difficulty project automatically means the programmer will drink coffee.

  • We want to achieve a balance between precision (i.e., when the model says the programmer drinks coffee on a difficult project, it’s correct) and recall (i.e., we capture all those who actually drink coffee, even if the model might make some wrong predictions).

Model Tuning and Optimization

After evaluating our model and selecting the appropriate evaluation metric, the time comes to optimize it. No matter how good the model is initially, it can always be improved. We will focus on various techniques that can be used to tune a k-NN (k-Nearest Neighbors) model, particularly the selection of the optimal number of "k" (k-nearest neighbors), choosing a distance metric, and scaling the data.

Selecting the Optimal Number of k (k-Nearest Neighbors)

One of the most important hyperparameters in k-NN algorithms is the number "k", which determines how many nearest neighbors are used to classify a new point.

  • Small k (e.g., k = 1): The model becomes overly sensitive to individual outliers in the data and may suffer from overfitting (it fits too much to the training data).

  • Large k (e.g., k = 100): The model becomes smoother, as it makes decisions based on a larger number of neighbors. However, if "k" is too large, the model may overlook details and suffer from underfitting.

The optimal "k" can be found in several ways, such as trying different values of "k" and observing how Accuracy, Precision, Recall, and F1-Score change. Alternatively, cross-validation can be used, where the data is split into multiple subsets (folds), and the model is trained with different "k" values on each fold to find the optimal value that minimizes error.

If we use k = 3, we may notice that the model is more sensitive to details, resulting in higher Precision but lower Recall. If we set k = 7, the model will be less sensitive to fluctuations, which may improve Recall but reduce Precision.

Selecting the Distance Metric

The k-NN algorithm also depends on the selection of the metric used to measure the distance between points. While Euclidean distance is the most common, there are other metrics that may be more appropriate for different problems.

Euclidean Distance (L2 norm): The most common metric, suitable for continuous data (such as our example with technical difficulty and coffee habits).

$$d(x, y) = \sqrt{ \sum_{i=1}^{n} (x_i - y_i)^2 }$$

fn euclidean_distance(x: &Vec<f32>, y: &Vec<f32>) -> f32 {
    assert_eq!(x.len(), y.len(), "Vectors must have the same length");

    x.iter()
     .zip(y.iter())
     .map(|(xi, yi)| (xi - yi).powi(2))
     .sum::<f32>()
     .sqrt()
}

fn main() {
    let x = vec![1.0, 2.0, 3.0];
    let y = vec![4.0, 5.0, 6.0];
    let distance = euclidean_distance(&x, &y);
    println!("Euclidean distance: {}", distance);
}

Manhattan Distance (L1 norm): This metric is used when the data is discrete or when the differences between values are "more even" and we don't want large fluctuations between neighbors to have a significant impact.

$$d(x, y) = \sum_{i=1}^{n} |x_i - y_i|$$

fn manhattan_distance(x: &Vec<f32>, y: &Vec<f32>) -> f32 {
    assert_eq!(x.len(), y.len(), "Vectors must have the same length");

    x.iter()
     .zip(y.iter())
     .map(|(xi, yi)| (xi - yi).abs())
     .sum()
}

fn main() {
    let x = vec![1.0, 2.0, 3.0];
    let y = vec![4.0, 5.0, 6.0];
    let distance = manhattan_distance(&x, &y);
    println!("Manhattan distance: {}", distance);
}

Minkowski Distance: This generalizes both of the previous distances. It includes a "p" parameter, which determines the weight assigned to each dimension. For p=2, we get Euclidean distance, and for p=1, we get Manhattan distance.

$$d(x, y) = \left( \sum_{i=1}^{n} |x_i - y_i|^p \right)^{\frac{1}{p}}$$

fn minkowski_distance(x: &Vec<f32>, y: &Vec<f32>, p: f32) -> f32 {
    assert_eq!(x.len(), y.len(), "Vectors must have the same length");

    x.iter()
     .zip(y.iter())
     .map(|(xi, yi)| (xi - yi).abs().powf(p))
     .sum::<f32>()
     .powf(1.0 / p)
}

fn main() {
    let x = vec![1.0, 2.0, 3.0];
    let y = vec![4.0, 5.0, 6.0];

    let distance_p1 = minkowski_distance(&x, &y, 1.0); // Manhattan distance
    let distance_p2 = minkowski_distance(&x, &y, 2.0); // Euclidean distance

    println!("Minkowski distance (p=1, Manhattan): {}", distance_p1);
    println!("Minkowski distance (p=2, Euclidean): {}", distance_p2);
}

Conclusion

Classification using k-NN is an extremely powerful tool in machine learning, especially for simple problems where it is necessary to quickly and efficiently predict the class based on historical data. However, selecting the right metric for evaluating the model is crucial for success in practice. Accuracy, precision, recall, and F1-score are all useful in different applications, and the correct metric can significantly affect the model's performance in real-world conditions.

k-NN is not perfect for all types of tasks. When focusing on improving the model's performance, we need to concentrate on optimizing parameters such as "k", selecting the appropriate distance metric, weighting neighbors, and normalizing the data. Thanks to these improvements, k-NN can be a powerful tool in a wide range of applications, which can be tailored to the needs of our users or specific tasks.

More about machine learning algorithms and metrics can be found in a separate project:

👉 https://mlcompassguide.dev/

The repository for this article can be found on GitHub:

👉 https://github.com/mortylen/ml-knn-metrics-rs

If you’ve made it this far, congratulations! I hope the article was useful and inspired you with new ideas or experiences. If you have any questions or comments, feel free to share them in the comments.

0
Subscribe to my newsletter

Read articles from mortylen directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

mortylen
mortylen

By day, I work on enterprise and industrial systems; by night, I experiment with modern technologies. I’m committed to lifelong learning and continuously exploring new ideas.