How KNN ( k - nearest neighbor ) works ? know in Just 3 minute.

The K-Nearest Neighbor ( KNN ) algorithm is a supervised learning method used for classification and regression. It is a non-parametric , Instance based algorithm , meaning it does not make assumptions about the underlying data distribution. Instead , it makes predictions based on similarity between data points.

KNN working Step 1 :

Choose the number of neighbors (K): K represents the number of closest data points (neighbors) used to make predictions. A small “k” (e.g. 1 or 3) may cause overfitting (too sensitive to noise) A large “k” (e.g. 10 or 15) may cause under fitting (to general and ignoring patterns).

Step 2 :

Find the nearest neighbor: In order to find the nearest neighbor, we measure the distance between the input data point and all other points in the dataset. Common distance metrics include

Euclidean Distance ( most common )

Manhattan Distance

Calculate distance : formula for Euclidean distance ( between two ✌ points

d(A, B) = √((x2 - x1)² + (y2 - y1)²)

Step 3:

Find the nearest neighbors: After calculating distance, the algorithm selects the “k” closest data points.

Step 4:

For classification : The most common class (majority vote) among the k neighbors is chosen.

For Regression : The average, value of the K neighbors are taken.

R programme

import libraries in R programming

Note : If not install already install it ✅

library(class)
library(ggplot2)

library(caret)

👀✅ :Load and Prepare Data

Load the built-in Iris dataset

data(iris)

Scale numeric features (standardize to mean=0, std=1)

iris[, 1:4] <- scale(iris[, 1:4])

Split dataset into training (80%) and testing (20%)

set.seed(123)

trainIndex <- createDataPartition(iris$Species, p = 0.8, list = FALSE)

iris.train <- iris[trainIndex, ]

iris.test <- iris[-trainIndex, ]

Extract predictor variables (❌) and target labels (y✅)

trainX <- iris.train[, 1:4]

trainY <- iris.train$Species

testX <- iris.test[, 1:4]

testY <- iris.test$Species

✅ Apply KnNN

Choose K = 5 (default value)

k_value <- 5

Perform KNN classification

knn.pred <- knn(train = trainX, test = testX, cl = trainY, k = k_value)

Print predictions

print(knn.pred)

🔗 Evaluate Model

Create a confusion matrix

conf_matrix <- table(Predicted = knn.pred, Actual = testY)

print(conf_matrix)

Calculate accuracy

accuracy <- sum(diag(conf_matrix)) / sum(conf_matrix)

print(paste("Accuracy:", round(accuracy * 100, 2), "%"))

♾ 🥉 Find optimal "k"

Finding the best K value

error_rate <- numeric(15)

for (k in 1:15) {

knn.pred <- knn(train = trainX, test = testX, cl = trainY, k = k)

error_rate[k] <- mean(knn.pred != testY) }

Convert to dataframe for plotting

error_df <- data.frame(K = 1:15, Error = error_rate)

Plot K vs Error Rate

ggplot(error_df, aes(x = K, y = Error)) +

geom_line(color = 'blue') + geom_point(color = 'red') +

ggtitle("Error Rate vs K in KNN") + xlab("K (Number of Neighbors)") +

ylab("Error Rate") +

theme_minimal()

0
Subscribe to my newsletter

Read articles from pranav madhukar sirsufale directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

pranav madhukar sirsufale
pranav madhukar sirsufale

🚀 Tech Enthusiast | Computer Science Graduate | Passionate about web development, app development, and data science. Skilled in JavaScript, Node.js, React, HTML, MySQL,Python and R Programming. Always learning and sharing insights on tech, programming tutorials, and practical guides.