KNN - K Nearest Neighbor

Start day: 08/10/2025 - Doan Ngoc Cuong
Introduction: Lazy Algorithm
KNN (Lazy Learning Algorithm) is a type of Instance-Based Learning
- Instance-Based Learning = a learning approach where the algorithm stores the training data (instances) and uses them directly to make predictions.
KNN is called a lazy learner because when we supply training data to this algorithm, the algorithm does not train itself at all.
- Example:
Link Reference: Why is KNN a lazy learner? - GeeksforGeeks
Distance:
Bruce Force and KDTree
Classification Report
Exactly how macro avg
and weighted avg
are calculated.
Suppose:
We have 2 classes:
Class A → 80 samples
Class B → 20 samples
Our model’s results:
Class | Precision | Recall | F1-score | Support |
A | 0.90 | 0.80 | 0.85 | 80 |
B | 0.50 | 1.00 | 0.67 | 20 |
Macro average
Formula:
$$[ \text{Macro avg} = \frac{\text{Metric}_A + \text{Metric}_B}{2} ]$$
Precision (macro) = (0.90 + 0.50) / 2 = 0.70
Recall (macro) = (0.80 + 1.00) / 2 = 0.90
F1 (macro) = (0.85 + 0.67) / 2 ≈ 0.76
Weighted average
Formula:
$$[ \text{Weighted avg} = \frac{\text{Metric}_A \cdot \text{Support}_A + \text{Metric}_B \cdot \text{Support}_B}{\text{Total support}} ]$$
Total support = 80 + 20 = 100
Precision (weighted) = (0.90×80 + 0.50×20) / 100
\= (72 + 10) / 100 = 0.82Recall (weighted) = (0.80×80 + 1.00×20) / 100
\= (64 + 20) / 100 = 0.84F1 (weighted) = (0.85×80 + 0.67×20) / 100
\= (68 + 13.4) / 100 ≈ 0.814
✅ Key difference:
Macro avg = treats A and B equally (0.70 precision, 0.90 recall, 0.76 f1).
Weighted avg = Class A dominates because it has 4× more samples (0.82 precision, 0.84 recall, 0.81 f1).
How to choose K
5.1 Basic Principles for Choosing k:
Use odd k (for binary classification) (prevents tie votes).
Some theoretical heuristic: k ≈ log(N) as a starting point.
5.2 Ways to Find the Optimal k
Step 1: Cross Validation
The model is trained on (k-1) folds and tested on the remaining fold. This is repeated k times, each time using a different fold for setting.
Why» Cross- validation lets you compare different k values fairly, using multiple train/test splits => reducing the risk of overfitting.
Step 2: Error Curve
An error Curve or accuracy curve) is a plot of model performance versus the parameter value.
- If k is very low => high variance, possibly high accu
- If k is very high => low variance, high biasBias and Variance:
- What is Bias ?
- What is Variance ?In a model. We have correlation between bias and variance (in under image):
+, Bias:
- Model performs poorly on both training and test data.
- High bias → model fails to capture underlying patterns → underfitting.+, Variance:
- High variance → model fits training data too closely and fails on unseen data → overfitting.
- Model performs well on training data but poorly on test data.
Subscribe to my newsletter
Read articles from Cường Đoàn Ngọc directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Cường Đoàn Ngọc
Cường Đoàn Ngọc
Name: Cường Educational Background: Data Science and Artificial Intelligence Current Role: AI Engineering at an AI Production company, specializing in Education AI Career Interests: Natural Language Processing (NLP), Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), Workflow Systems, and AI Agents Personal Interests: Lifelong learning, personal development, speed-hacking (accelerated learning/productivity), and networking