How KNN ( k - nearest neighbor ) works ? know in Just 3 minute.

The K-Nearest Neighbor ( KNN ) algorithm is a supervised learning method used for classification and regression. It is a non-parametric , Instance based algorithm , meaning it does not make assumptions about the underlying data distribution. Instead , it makes predictions based on similarity between data points.
KNN working Step 1 :
Choose the number of neighbors (K): K represents the number of closest data points (neighbors) used to make predictions. A small “k” (e.g. 1 or 3) may cause overfitting (too sensitive to noise) A large “k” (e.g. 10 or 15) may cause under fitting (to general and ignoring patterns).
Step 2 :
Find the nearest neighbor: In order to find the nearest neighbor, we measure the distance between the input data point and all other points in the dataset. Common distance metrics include
Euclidean Distance ( most common )
Manhattan Distance
Calculate distance : formula for Euclidean distance ( between two ✌ points
d(A, B) = √((x2 - x1)² + (y2 - y1)²)
Step 3:
Find the nearest neighbors: After calculating distance, the algorithm selects the “k” closest data points.
Step 4:
For classification : The most common class (majority vote) among the k neighbors is chosen.
For Regression : The average, value of the K neighbors are taken.
R programme
import libraries in R programming
Note : If not install already install it ✅
library(class)
library(ggplot2)
library(caret)
👀✅ :Load and Prepare Data
Load the built-in Iris dataset
data(iris)
Scale numeric features (standardize to mean=0, std=1)
iris[, 1:4] <- scale(iris[, 1:4])
Split dataset into training (80%) and testing (20%)
set.seed(123)
trainIndex <- createDataPartition(iris$Species, p = 0.8, list = FALSE)
iris.train <- iris[trainIndex, ]
iris.test <- iris[-trainIndex, ]
Extract predictor variables (❌) and target labels (y✅)
trainX <- iris.train[, 1:4]
trainY <- iris.train$Species
testX <- iris.test[, 1:4]
testY <- iris.test$Species
✅ Apply KnNN
Choose K = 5 (default value)
k_value <- 5
Perform KNN classification
knn.pred <- knn(train = trainX, test = testX, cl = trainY, k = k_value)
Print predictions
print(knn.pred)
🔗 Evaluate Model
Create a confusion matrix
conf_matrix <- table(Predicted = knn.pred, Actual = testY)
print(conf_matrix)
Calculate accuracy
accuracy <- sum(diag(conf_matrix)) / sum(conf_matrix)
print(paste("Accuracy:", round(accuracy * 100, 2), "%"))
♾ 🥉 Find optimal "k"
Finding the best K value
error_rate <- numeric(15)
for (k in 1:15) {
knn.pred <- knn(train = trainX, test = testX, cl = trainY, k = k)
error_rate[k] <- mean(knn.pred != testY) }
Convert to dataframe for plotting
error_df <- data.frame(K = 1:15, Error = error_rate)
Plot K vs Error Rate
ggplot(error_df, aes(x = K, y = Error)) +
geom_line(color = 'blue') + geom_point(color = 'red') +
ggtitle("Error Rate vs K in KNN") + xlab("K (Number of Neighbors)") +
ylab("Error Rate") +
theme_minimal()
Subscribe to my newsletter
Read articles from pranav madhukar sirsufale directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

pranav madhukar sirsufale
pranav madhukar sirsufale
🚀 Tech Enthusiast | Computer Science Graduate | Passionate about web development, app development, and data science. Skilled in JavaScript, Node.js, React, HTML, MySQL,Python and R Programming. Always learning and sharing insights on tech, programming tutorials, and practical guides.