Customer Segmentation using Clustering on the Online Retail Dataset


Introduction
Hey everyone! ๐
I recently worked on an unsupervised machine learning project that focuses on clustering customers based on their purchase behavior using the Online Retail dataset.
The goal was to segment customers so businesses can better understand and target them through personalized marketing, improved retention strategies, and optimized customer experiences.
This post covers the process I followed, key concepts used, and the valuable insights I gained.
โAbout the Dataset
The dataset used is the Online Retail dataset, which contains transaction data for a UK-based online store from 2010 to 2011.
Total Rows: ~540,000
Features include:
InvoiceNo
,StockCode
,Description
Quantity
,InvoiceDate
,UnitPrice
CustomerID
,Country
This dataset is widely used in customer segmentation and retail analytics problems.
๐ง Data Preprocessing
Before clustering, I performed several preprocessing steps:
Removed missing values
Dropped canceled orders (Invoices starting with โCโ)
Created an RFM (Recency, Frequency, Monetary) dataset based on customer purchase behavior
Standardized the data using
StandardScaler
for uniform scale across features
๐ Feature Engineering: RFM Model
I used the RFM (Recency, Frequency, Monetary) model to extract customer behavior patterns:
Recency โ How recently did the customer purchase?
Frequency โ How often did they purchase?
Monetary โ How much did they spend?
These features are ideal for clustering customers into meaningful groups.
โClustering with K-Means
I applied K-Means Clustering to group customers into different segments:
Used Elbow Method to find the optimal number of clusters
Evaluated clustering performance using Silhouette Score
Used PCA (Principal Component Analysis) to reduce dimensionality for 2D visualization
๐ Visualization
I visualized the clusters using PCA to get a better understanding of how customers are grouped.
Different colors represent different segments, and the spread helps identify high-value or infrequent customers.
๐ก Key Insights
Cluster 0: High-frequency, high-spending customers (Loyal Customers)
Cluster 1: Low-frequency, low-spending (Potential Churn)
Cluster 2: Occasional big spenders (Opportunistic)
Such segmentation helps businesses target promotions, recommend products, and allocate marketing budgets more effectively.
โ What I Learned
Importance of feature selection in unsupervised learning
Practical application of K-Means and RFM modeling
Power of PCA in visualizing high-dimensional data
How clustering can drive data-backed business decisions
๐ Project Links
โLetโs Connect!
If you're also working on ML or interested in data science projects, let's connect and grow together!
Subscribe to my newsletter
Read articles from Lokesh Patidar directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Lokesh Patidar
Lokesh Patidar
Hey, I'm Lokesh Patidar! I'm a 2nd-year student at SATI Vidisha, passionate about AI, Machine Learning, Full-Stack Development , and DSA. What I'm Learning: Currently Exploring Machine Learning ๐ค Completed DSA & Frontend Development ๐ Now exploring Backend Development ๐ก Interests: I love solving problems, building projects, and integrating AI into real-world applications. Excited to contribute to tech communities and share my learning journey! ๐ Follow my blog for insights on AI, ML, and Full-Stack projects!