🤝 Understanding K-Nearest Neighbors (KNN): A Beginner’s Guide with Python

Tilak SavaniTilak Savani
3 min read

“In KNN, you are who your neighbors are.”

— Tilak Savani



🧠 Introduction

K-Nearest Neighbors (KNN) is one of the simplest and most intuitive machine learning algorithms. It’s often used for both classification and regression problems — but shines best in classification tasks.

In this blog, we’ll understand the concept behind KNN, its math, and see a hands-on example using Python.


🤔 What is KNN?

KNN is a lazy learner algorithm. It doesn’t learn a model during training. Instead, it stores the entire dataset and makes predictions only at the time of testing by looking at the k-nearest training examples.


⚙️ How KNN Works (Step-by-Step)

  1. Choose the number k of neighbors.

  2. Calculate the distance between the new data point and all training points.

  3. Pick the k closest points.

  4. For classification: return the majority class among those neighbors.

  5. For regression: return the average of the neighbors’ values.


🧮 Math Behind K-Nearest Neighbors (KNN)

KNN doesn't have an explicit training phase — instead, it makes predictions by calculating distances between data points.

📏 1. Distance Calculation

The most common distance metric used is Euclidean Distance:

    d(p, q) = √[(p₁ − q₁)² + (p₂ − q₂)² + ... + (pₙ − qₙ)²]

Where:

  • p = (p₁, p₂, ..., pₙ) is the input data point (query)

  • q = (q₁, q₂, ..., qₙ) is a point from the training data

  • n is the number of features

➕ Other Distance Metrics

  • Manhattan Distance (L1 norm):
     d(p, q) = |p₁ − q₁| + |p₂ − q₂| + ... + |pₙ − qₙ|
  • Minkowski Distance (generalized):
    d(p, q) = [Σ |pᵢ − qᵢ|^r]^(1/r)
  • For r = 1: Manhattan Distance

  • For r = 2: Euclidean Distance

🧮 2. Voting or Averaging

Once distances are calculated:

  • For classification: take a majority vote of the k nearest neighbors' labels.

  • For regression: take the average of the k nearest neighbors' target values.

If there's a tie, some implementations use distance-weighted voting:

    weight = 1 / distance²

This gives closer neighbors more influence on the prediction.


🧪 Python Code Example (Classification)

Let’s classify if a person will buy a product based on age and salary.

import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
import matplotlib.pyplot as plt

# Sample Data
data = {
    'Age': [22, 25, 47, 52, 46, 56],
    'Salary': [15000, 29000, 48000, 60000, 52000, 61000],
    'Buys': [0, 0, 1, 1, 1, 1]
}

df = pd.DataFrame(data)
X = df[['Age', 'Salary']]
y = df['Buys']

# Model
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X, y)

# Predict
print(model.predict([[30, 40000]]))  # Output: [0]

📊 Visualizing the Neighbors

import seaborn as sns

sns.scatterplot(x='Age', y='Salary', hue='Buys', data=df, palette='coolwarm')
plt.scatter(30, 40000, color='black', label='New Point')
plt.title("KNN Visualization")
plt.legend()
plt.show()

🌍 Real-World Applications

DomainUse Case
FinanceCredit risk classification
HealthcareDisease diagnosis from symptoms
E-commerceProduct recommendation engines
SecurityIntrusion detection systems

✅ Advantages

  • Easy to implement and understand

  • No training time required

  • Works well on small datasets


⚠️ Disadvantages

  • Slow with large datasets

  • Sensitive to irrelevant features

  • Requires feature scaling


🧩 Final Thoughts

K-Nearest Neighbors is a powerful tool for beginners — it requires no assumptions about data distribution and is easy to understand. By mastering KNN, you build a strong foundation for more advanced algorithms like SVM, Random Forest, and Neural Networks.

“Sometimes the best predictions come from looking at your closest neighbors.”


📬 Subscribe

If you liked this blog, follow me on Hashnode for more beginner-friendly guides in Machine Learning and Python.

Thanks for reading! 😊

0
Subscribe to my newsletter

Read articles from Tilak Savani directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Tilak Savani
Tilak Savani