📜 K-Means Clustering in Python: A Step-by-Step Guide for Beginners

Unaiza NoumanUnaiza Nouman
2 min read

🚀 Introduction

Hey there! 👋
If you’ve ever heard the term “Clustering” and wondered what it really means, you’re in the right place. Today, I’m going to walk you through a hands-on project where we’ll implement the K-Means Clustering algorithm from scratch using Python. And don’t worry—I'll guide you step by step, just like I would if we were sitting together with laptops open! 💻☕

We’ll be using the famous Iris dataset and will write every line of code ourselves to truly understand how clustering works. Let's dive in!

🔧 Step 1: Install Python and pip

Let’s make sure Python and pip are installed.

Command:

sudo apt update

sudo apt install python3

sudo apt install python3-pip

Now, check versions to confirm installation.

Command:

python3 --version

pip3 --version

🔧 Step 2: Create a New Python File

In the terminal, go to the folder where you want to save the file and create the new file.

Command: nano kmeans.py

This will open a blank file in the terminal. Keep it open — we’ll paste code into it soon!

🔧 Step 3: Install Required Libraries

Before writing the code, let’s install the Python libraries we’ll need.

Command: sudo apt install python3-numpy python3-pandas python3-matplotlib python3-sklearn

These libraries will help us with:

  • NumPy – for math operations

  • Pandas – for handling datasets

  • Matplotlib – for plotting

  • Scikit-learn – for loading the Iris dataset

🔧 Step 4: Paste the Code (In Your File)

To keep things clean and beginner-friendly, I’ve uploaded the complete Python code to my GitHub repository. You can download or clone it directly:

📂 GitHub Repo👉 https://github.com/unaizanouman/K_means-Clustering

🔧 Step 5: Run Your Python File

Now run the file using:

Command: python3 kmeans.py

🎉 Boom! A colorful scatter plot will pop up showing three clusters and their red centroids!

🧠 What You Learned

You just:

✅ Loaded and explored the Iris dataset

✅ Implemented K-Means Clustering from scratch

✅ Understood how Euclidean Distance works

✅ Visualized clusters using Matplotlib

✅ Ran Python code confidently from the terminal

And the best part? You didn’t use any ready-made clustering model — you coded it yourself. That’s the real power of learning! 💪

I hope this project made you feel more confident working with Python and algorithms.

📌 Stay tuned for my upcoming posts, where I’ll explore real-world datasets, use scikit-learn models, and show how clustering is used in practical applications.

📬 Questions? Suggestions? Just Wanna Say Hi?

Feel free to reach out!

📧 unaizaray@gmail.com

I'd love to hear your feedback or help if you get stuck!

0
Subscribe to my newsletter

Read articles from Unaiza Nouman directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Unaiza Nouman
Unaiza Nouman

👩‍💻 Unaiza Nouman 🎓 CS Student @ COMSATS | 💡 Data Science Enthusiast | 🛠️ Software Developer Curious mind with a passion for building smart, scalable solutions. Exploring the world of: 🐍 Python (Pandas, NumPy) | 📊 Power BI | 🧠 Machine Learning 🧮 SQL Server | ☕ Java | 💻 C++ | 📞 VoIP (Asterisk) 🧵 DSA | 🐧 Linux | 💭 Problem Solving I write to learn, build to grow, and share to inspire. Let’s turn lines of code into something meaningful 🚀