Beginner’s Guide to RFM Analysis with Python

Introduction

Businesses often ask: "Who are our most loyal customers? Who's at risk of leaving?" The answer lies in a powerful but simple technique: RFM Analysis.

In this blog, I'll walk you through RFM Analysis using real transaction data, Python code, and actionable visualizations. We'll explore how RFM can be used for customer segmentation, churn prediction, and marketing optimization. If you're a student or beginner in data science, this is your entry into real-world customer analytics.

What is RFM Analysis?

RFM stands for:

Recency (R): How recently a customer made a purchase
Frequency (F): How often they make purchases
Monetary (M): How much money they've spent

These three metrics give you a 360° view of customer behavior. For example:

A customer who purchased last week (low Recency), shops monthly (high Frequency), and spends a lot (high Monetary) is considered high value.
A customer who purchased 6 months ago and spent very little may be at risk of churning.

RFM analysis helps marketers prioritize, personalize, and predict.

Why RFM is Relevant Today

✉️ Email marketing: Send retention offers to inactive users
🌐 Ad targeting: Focus ad spend on high-value customers
⚖️ Churn reduction: Flag low-frequency, low-monetary customers
🧬 Customer Lifetime Value (CLTV): RFM feeds into long-term revenue prediction

It's widely used by Amazon, Netflix, and even startups that need fast, low-code solutions.

Dataset Overview

We use anonymized transaction data from a loyalty-based fuel retail system.

Key columns:

customer_id
transaction_date
amount
quantity
product_type

We'll prepare this dataset for RFM and use Python to process and visualize it.

Step-by-Step: RFM Analysis in Python

import pandas as pd
import numpy as np
from datetime import datetime
import seaborn as sns
import matplotlib.pyplot as plt

# Load and prepare data
df = pd.read_csv("your_dataset.csv")
df['transaction_date'] = pd.to_datetime(df['transaction_date'])

# Define snapshot date (1 day after last transaction)
snapshot_date = df['transaction_date'].max() + pd.Timedelta(days=1)

# Group by customer
rfm = df.groupby('customer_id').agg({
    'transaction_date': lambda x: (snapshot_date - x.max()).days,
    'customer_id': 'count',
    'amount': 'sum'
})

# Rename columns
rfm.columns = ['Recency', 'Frequency', 'Monetary']
rfm = rfm.reset_index()

You now have an RFM table for all users.

RFM Score Binning (Optional but Insightful)

# Score each metric from 1 (worst) to 5 (best)
rfm['R_Score'] = pd.qcut(rfm['Recency'], 5, labels=[5,4,3,2,1])
rfm['F_Score'] = pd.qcut(rfm['Frequency'].rank(method='first'), 5, labels=[1,2,3,4,5])
rfm['M_Score'] = pd.qcut(rfm['Monetary'], 5, labels=[1,2,3,4,5])

# Create combined RFM segment
rfm['RFM_Segment'] = rfm['R_Score'].astype(str) + rfm['F_Score'].astype(str) + rfm['M_Score'].astype(str)

# Average score to rank users
rfm['RFM_Score'] = rfm[['R_Score','F_Score','M_Score']].astype(int).mean(axis=1)

This scoring system helps you create user segments like:

555: Champions
111: Lost customers
345: Potential loyalists

Visualizing RFM

fig, axes = plt.subplots(1, 3, figsize=(18, 5))
sns.histplot(rfm['Recency'], ax=axes[0], kde=True, color='skyblue')
axes[0].set_title("Recency Distribution")
sns.histplot(rfm['Frequency'], ax=axes[1], kde=True, color='orange')
axes[1].set_title("Frequency Distribution")
sns.histplot(rfm['Monetary'], ax=axes[2], kde=True, color='green')
axes[2].set_title("Monetary Distribution")
plt.tight_layout()
plt.show()

Real-World Insights

In my churn prediction project:

Users with Recency > 45 days and low Frequency had high churn probability.
Users with high Monetary and recent transactions were retained longer.

Using this RFM table, I created features for a Random Forest model (see Blog #1).

Next Steps & Applications

Create targeted campaigns for 555 users
Offer discounts to 111 and 211 users
Add RFM scores to your ML pipeline
Try clustering RFM segments using K-Means

Conclusion

RFM is powerful, practical, and production-ready. It helps you find what truly matters: who your best customers are.

Want to see how I used RFM for churn prediction? 👉 Read Blog #1: How I Built a Real-World Churn Prediction Model

Beginner's Guide to RFM Analysis — Explained with Real Data and Python

Table of contents