🤝 Understanding K-Nearest Neighbors (KNN): A Beginner’s Guide with Python


“In KNN, you are who your neighbors are.”
— Tilak Savani
🧠 Introduction
K-Nearest Neighbors (KNN) is one of the simplest and most intuitive machine learning algorithms. It’s often used for both classification and regression problems — but shines best in classification tasks.
In this blog, we’ll understand the concept behind KNN, its math, and see a hands-on example using Python.
🤔 What is KNN?
KNN is a lazy learner algorithm. It doesn’t learn a model during training. Instead, it stores the entire dataset and makes predictions only at the time of testing by looking at the k-nearest training examples.
⚙️ How KNN Works (Step-by-Step)
Choose the number
k
of neighbors.Calculate the distance between the new data point and all training points.
Pick the
k
closest points.For classification: return the majority class among those neighbors.
For regression: return the average of the neighbors’ values.
🧮 Math Behind K-Nearest Neighbors (KNN)
KNN doesn't have an explicit training phase — instead, it makes predictions by calculating distances between data points.
📏 1. Distance Calculation
The most common distance metric used is Euclidean Distance:
d(p, q) = √[(p₁ − q₁)² + (p₂ − q₂)² + ... + (pₙ − qₙ)²]
Where:
p = (p₁, p₂, ..., pₙ)
is the input data point (query)q = (q₁, q₂, ..., qₙ)
is a point from the training datan
is the number of features
➕ Other Distance Metrics
- Manhattan Distance (L1 norm):
d(p, q) = |p₁ − q₁| + |p₂ − q₂| + ... + |pₙ − qₙ|
- Minkowski Distance (generalized):
d(p, q) = [Σ |pᵢ − qᵢ|^r]^(1/r)
For
r = 1
: Manhattan DistanceFor
r = 2
: Euclidean Distance
🧮 2. Voting or Averaging
Once distances are calculated:
For classification: take a majority vote of the
k
nearest neighbors' labels.For regression: take the average of the
k
nearest neighbors' target values.
If there's a tie, some implementations use distance-weighted voting:
weight = 1 / distance²
This gives closer neighbors more influence on the prediction.
🧪 Python Code Example (Classification)
Let’s classify if a person will buy a product based on age and salary.
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
import matplotlib.pyplot as plt
# Sample Data
data = {
'Age': [22, 25, 47, 52, 46, 56],
'Salary': [15000, 29000, 48000, 60000, 52000, 61000],
'Buys': [0, 0, 1, 1, 1, 1]
}
df = pd.DataFrame(data)
X = df[['Age', 'Salary']]
y = df['Buys']
# Model
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X, y)
# Predict
print(model.predict([[30, 40000]])) # Output: [0]
📊 Visualizing the Neighbors
import seaborn as sns
sns.scatterplot(x='Age', y='Salary', hue='Buys', data=df, palette='coolwarm')
plt.scatter(30, 40000, color='black', label='New Point')
plt.title("KNN Visualization")
plt.legend()
plt.show()
🌍 Real-World Applications
Domain | Use Case |
Finance | Credit risk classification |
Healthcare | Disease diagnosis from symptoms |
E-commerce | Product recommendation engines |
Security | Intrusion detection systems |
✅ Advantages
Easy to implement and understand
No training time required
Works well on small datasets
⚠️ Disadvantages
Slow with large datasets
Sensitive to irrelevant features
Requires feature scaling
🧩 Final Thoughts
K-Nearest Neighbors is a powerful tool for beginners — it requires no assumptions about data distribution and is easy to understand. By mastering KNN, you build a strong foundation for more advanced algorithms like SVM, Random Forest, and Neural Networks.
“Sometimes the best predictions come from looking at your closest neighbors.”
📬 Subscribe
If you liked this blog, follow me on Hashnode for more beginner-friendly guides in Machine Learning and Python.
Thanks for reading! 😊
Subscribe to my newsletter
Read articles from Tilak Savani directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
