Beginner's Guide to RFM Analysis — Explained with Real Data and Python

Introduction

Businesses often ask: "Who are our most loyal customers? Who's at risk of leaving?" The answer lies in a powerful but simple technique: RFM Analysis.

In this blog, I'll walk you through RFM Analysis using real transaction data, Python code, and actionable visualizations. We'll explore how RFM can be used for customer segmentation, churn prediction, and marketing optimization. If you're a student or beginner in data science, this is your entry into real-world customer analytics.


What is RFM Analysis?

RFM stands for:

  • Recency (R): How recently a customer made a purchase

  • Frequency (F): How often they make purchases

  • Monetary (M): How much money they've spent

These three metrics give you a 360° view of customer behavior. For example:

  • A customer who purchased last week (low Recency), shops monthly (high Frequency), and spends a lot (high Monetary) is considered high value.

  • A customer who purchased 6 months ago and spent very little may be at risk of churning.

RFM analysis helps marketers prioritize, personalize, and predict.


Why RFM is Relevant Today

  • ✉️ Email marketing: Send retention offers to inactive users

  • 🌐 Ad targeting: Focus ad spend on high-value customers

  • ⚖️ Churn reduction: Flag low-frequency, low-monetary customers

  • 🧬 Customer Lifetime Value (CLTV): RFM feeds into long-term revenue prediction

It's widely used by Amazon, Netflix, and even startups that need fast, low-code solutions.


Dataset Overview

We use anonymized transaction data from a loyalty-based fuel retail system.

Key columns:

  • customer_id

  • transaction_date

  • amount

  • quantity

  • product_type

We'll prepare this dataset for RFM and use Python to process and visualize it.


Step-by-Step: RFM Analysis in Python

import pandas as pd
import numpy as np
from datetime import datetime
import seaborn as sns
import matplotlib.pyplot as plt

# Load and prepare data
df = pd.read_csv("your_dataset.csv")
df['transaction_date'] = pd.to_datetime(df['transaction_date'])

# Define snapshot date (1 day after last transaction)
snapshot_date = df['transaction_date'].max() + pd.Timedelta(days=1)

# Group by customer
rfm = df.groupby('customer_id').agg({
    'transaction_date': lambda x: (snapshot_date - x.max()).days,
    'customer_id': 'count',
    'amount': 'sum'
})

# Rename columns
rfm.columns = ['Recency', 'Frequency', 'Monetary']
rfm = rfm.reset_index()

You now have an RFM table for all users.


RFM Score Binning (Optional but Insightful)

# Score each metric from 1 (worst) to 5 (best)
rfm['R_Score'] = pd.qcut(rfm['Recency'], 5, labels=[5,4,3,2,1])
rfm['F_Score'] = pd.qcut(rfm['Frequency'].rank(method='first'), 5, labels=[1,2,3,4,5])
rfm['M_Score'] = pd.qcut(rfm['Monetary'], 5, labels=[1,2,3,4,5])

# Create combined RFM segment
rfm['RFM_Segment'] = rfm['R_Score'].astype(str) + rfm['F_Score'].astype(str) + rfm['M_Score'].astype(str)

# Average score to rank users
rfm['RFM_Score'] = rfm[['R_Score','F_Score','M_Score']].astype(int).mean(axis=1)

This scoring system helps you create user segments like:

  • 555: Champions

  • 111: Lost customers

  • 345: Potential loyalists


Visualizing RFM

fig, axes = plt.subplots(1, 3, figsize=(18, 5))
sns.histplot(rfm['Recency'], ax=axes[0], kde=True, color='skyblue')
axes[0].set_title("Recency Distribution")
sns.histplot(rfm['Frequency'], ax=axes[1], kde=True, color='orange')
axes[1].set_title("Frequency Distribution")
sns.histplot(rfm['Monetary'], ax=axes[2], kde=True, color='green')
axes[2].set_title("Monetary Distribution")
plt.tight_layout()
plt.show()

Real-World Insights

In my churn prediction project:

  • Users with Recency > 45 days and low Frequency had high churn probability.

  • Users with high Monetary and recent transactions were retained longer.

Using this RFM table, I created features for a Random Forest model (see Blog #1).


Next Steps & Applications

  • Create targeted campaigns for 555 users

  • Offer discounts to 111 and 211 users

  • Add RFM scores to your ML pipeline

  • Try clustering RFM segments using K-Means


Conclusion

RFM is powerful, practical, and production-ready. It helps you find what truly matters: who your best customers are.

Want to see how I used RFM for churn prediction? 👉 Read Blog #1: How I Built a Real-World Churn Prediction Model


0
Subscribe to my newsletter

Read articles from Siddhesh Toraskar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Siddhesh Toraskar
Siddhesh Toraskar

Hey! I’m Siddhesh — a student, coder, and data enthusiast. This blog is my digital notebook where I share what I learn, build, and explore in tech. From code to insights — it’s all here