Start day: 08/10/2025 - Doan Ngoc Cuong

Instance-Based Learning = a learning approach where the algorithm stores the training data (instances) and uses them directly to make predictions.

KNN is called a lazy learner because when we supply training data to this algorithm, the algorithm does not train itself at all.
- Example:

Exactly how macro avg and weighted avg are calculated.

Suppose:

We have 2 classes:

Our model’s results:

Formula:

$$[ \text{Macro avg} = \frac{\text{Metric}_A + \text{Metric}_B}{2} ]$$

Formula:

$$[ \text{Weighted avg} = \frac{\text{Metric}_A \cdot \text{Support}_A + \text{Metric}_B \cdot \text{Support}_B}{\text{Total support}} ]$$

Total support = 80 + 20 = 100

Precision (weighted) = (0.90×80 + 0.50×20) / 100
\= (72 + 10) / 100 = 0.82
Recall (weighted) = (0.80×80 + 1.00×20) / 100
\= (64 + 20) / 100 = 0.84
F1 (weighted) = (0.85×80 + 0.67×20) / 100
\= (68 + 13.4) / 100 ≈ 0.814

✅ Key difference:

Macro avg = treats A and B equally (0.70 precision, 0.90 recall, 0.76 f1).
Weighted avg = Class A dominates because it has 4× more samples (0.82 precision, 0.84 recall, 0.81 f1).

Step 1: Cross Validation

The model is trained on (k-1) folds and tested on the remaining fold. This is repeated k times, each time using a different fold for setting.

Why» Cross- validation lets you compare different k values fairly, using multiple train/test splits => reducing the risk of overfitting.
Step 2: Error Curve

An error Curve or accuracy curve) is a plot of model performance versus the parameter value.
- If k is very low => high variance, possibly high accu
- If k is very high => low variance, high bias

Bias and Variance:
- What is Bias ?
- What is Variance ?

In a model. We have correlation between bias and variance (in under image):
+, Bias:
- Model performs poorly on both training and test data.
- High bias → model fails to capture underlying patterns → underfitting.

+, Variance:
- High variance → model fits training data too closely and fails on unseen data → overfitting.
- Model performs well on training data but poorly on test data.

Link: Bias and Variance in Machine Learning - GeeksforGeeks

KNN - K Nearest Neighbor

Introduction: Lazy Algorithm