K-Means Doesn’t Learn — It Just Labels Without Thinking


Imagine walking into a party...
You don’t know anyone.
There are no name tags, No signs,No seating charts but somehow, you start noticing patterns:
That group by the buffet is talking about tech.
The ones near the speaker? All dancing.
A bunch by the window? Deep in books and quiet talks.
No one told you how to sort them.
You just did naturally, Intuitively,effortlessly.
That’s what K-Means Clustering teaches machines to do.
To see structure where no labels exist — to separate the chaos into something meaningful.
A Short Origin Story
Unsupervised learning has been around as long as humans have tried to recognize patterns without labels. But in 1957, mathematician Stuart Lloyd introduced the algorithm that would evolve into K-Means — during a study on signal quantization at Bell Labs.
It wasn't about big data back then. It was about compressing information efficiently.
Fast forward, and K-Means now powers customer segmentation, market basket analysis, gene expression grouping and even image compression.
From audio signals to Amazon recommendations — K-Means quietly powers the structure behind the scenes.
Why You Should Care
Before we dive into math or steps, ask yourself this:
How does Spotify group users with similar music tastes?
How does Netflix suggest shows to just the right kind of viewer?
How do marketers know there are 5 core customer types, not 50?
How does Google Photos recognize your cousin's face… even if you never tagged her?
None of this is done manually, There is no army of humans labeling each user, face, or customer.
So, how does it work?
Behind the scenes, machines are looking for patterns. They’re identifying groups based on similarity — without any labels at all.
And one of the most elegant ways they do this is through K-Means Clustering, Of course, there are other clustering methods — but K-Means is a great place to start.
What is K-Means?
K-Means is a clustering algorithm which split data into K groups, based on how similar the data points are to each other.
Step-by-Step: How K-Means Actually Works
Let’s walk through the algorithm — imagine we’re trying to group students based on their math and science scores.
- Choose K (Number of Clusters)
You decide how many groups you want.
Let’s say:
“I want to split my students into 3 performance groups.”
So, K = 3.
- Randomly Place K Centroids
A centroid is the center of a cluster — like the leader or the “gravity point” of a group. At first, these are randomly scattered points in space.
Assign Each Point to the Nearest Centroid
Now each student (dot) looks at all the centroids and says:
“Who’s closest to me?” and joins that group.
Distance is usually calculated using Euclidean distance (the straight-line distance between two points).
- Update the Centroids
After assignment, the algorithm says:
“Okay, now each group has members. Let's move the centroid to the average location of all the people in that group.”
This average is called the mean — hence the name: K-MEANS.
- Repeat Until Nothing Changes
The process of:
Assigning points,updating centroids ...repeats over and over until:
The centroids stop moving, or A maximum number of loops is reached. At that point, you have your final clusters!
Let’s See the Math
Here’s a simple overview of the math behind each step.
1. Initialization
Choose K centroids:
These are initially random points in the same space as your data.
2. Assignment Step
Each data point is assigned to the cluster whose centroid is closest:
$$\text{Cluster}(x_i) = \underset{j}{\arg\min} \; ||x_i - c_j||^2$$
3. Update Step
For each cluster, update the centroid by computing the mean of all data points assigned to it:
$$c_j = \frac{1}{N_j} \sum_{x_i \in S_j} x_i$$
Where:
c_j is the updated centroid of cluster
S_j is the set of points in cluster
N_j is the number of points in cluster
4. Repeat
Repeat the assignment and update steps until the centroids no longer move significantly, or after a fixed number of iterations.
K-Means Clustering from Scratch in Python
We will implement K-Means Clustering from scratch using pure Python and numpy. We'll visualize the results using matplotlib. This is a great beginner-friendly intro to unsupervised learning.
#Step 1: Import Libraries and Generate Data
import numpy as np
import matplotlib.pyplot as plt
# Generate 2D sample data
np.random.seed(42)
data = np.random.randn(300, 2) # 300 points, 2 features (2D)
def kmeans(data, k, max_iterations=100):
# Step 1: Randomly select k data points as initial centroids
indices = np.random.choice(len(data), k, replace=False)
centroids = data[indices]
for _ in range(max_iterations):
clusters = [[] for _ in range(k)]
# Step 2: Assign each data point to the nearest centroid
for point in data:
distances = [np.linalg.norm(point - centroid) for centroid in centroids]
closest = distances.index(min(distances))
clusters[closest].append(point)
prev_centroids = centroids.copy()
# Step 4: Recalculate centroids
for i in range(k):
if clusters[i]: # Avoid empty cluster
centroids[i] = np.mean(clusters[i], axis=0)
# Step 5: Check for convergence
if np.allclose(prev_centroids, centroids):
break
return centroids, clusters
k = 3 # Number of clusters
final_centroids, final_clusters = kmeans(data, k)
# Plot the results
colors = ['red', 'green', 'blue']
for i, cluster in enumerate(final_clusters):
cluster = np.array(cluster)
plt.scatter(cluster[:, 0], cluster[:, 1], c=colors[i], label=f'Cluster {i+1}')
# Plot centroids
final_centroids = np.array(final_centroids)
plt.scatter(final_centroids[:, 0], final_centroids[:, 1], c='black', marker='x', s=100, label='Centroids')
plt.title('K-Means Clustering from Scratch')
plt.legend()
plt.show()
Subscribe to my newsletter
Read articles from Precious Robert directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Precious Robert
Precious Robert
👋 Hi everyone, I'm Ognev Robert Precious. I've been on a journey to learn data science since 2021. It wasn’t until last year that things finally started to click. Since then, I’ve been working through projects, mostly using Kaggle datasets, and building up my understanding through practice. This blog is where I share everything I’m learning — from hands-on tutorials to projects I’ve solved, and how I approached them. I’m also a neuroscientist, and I love making learning feel approachable and engaging. Sometimes, when I find a topic hard to understand on the internet, I rewrite it in a way that makes sense to me. That’s what you’ll find here: a learning space built around curiosity, clarity, and personal growth. Let’s keep learning together!