"Building a Production-Grade Insurance Fraud Detection System with Streamlit, XGBoost, and MLOps" slug: insurance-fraud-detection-mlops


excerpt: "Learn how to build a complete fraud detection pipeline using Python, XGBoost, joblib, Streamlit, and MLOps principles. From data preprocessing to real-time dashboards—this guide walks you through it all."
tags:
machine-learning
streamlit
data-science
xgboost
mlops
Introduction
Fraudulent insurance claims cost the industry billions annually. In this blog, we'll dive deep into designing and deploying a production-grade insurance fraud detection pipeline using:
XGBoost for robust classification
Streamlit for interactive dashboards
MLOps practices (modular structure, testing, Docker, CI/CD)
Whether you're an aspiring data scientist or a professional preparing for a FAANG role, this project showcases how to build real-world systems.
Project Structure
insurance-fraud-detection/
├── data/ # Raw and processed datasets
├── models/ # Saved model + scaler (joblib)
├── src/ # Modular Python scripts (optional if not using direct imports)
├── dashboards/ # Streamlit UI
├── api/ # FastAPI for REST serving
├── tests/ # Pytest unit tests
├── requirements.txt # Dependencies
├── config.yaml # Config file
├── pyproject.toml # Linting & formatting
├── Makefile # CLI workflow automation
├── .gitignore # Ignore unnecessary files
└── README.md # Full project overview
Dataset Overview
We use a simplified insurance dataset with anonymized features:
Feature | Description |
age | Customer's age |
policy_sales_channel | Agent or online channel ID |
gender | 0 = Female, 1 = Male |
previously_insured | Has previous insurance? (0/1) |
vehicle_age | 0 = <1yr, 1 = 1-2yr, 2 = >2yr |
vehicle_damage | 0 = No, 1 = Yes |
is_fraud | Target variable |
Model Building (XGBoost + Preprocessing)
We use a Pipeline
to ensure feature scaling + model inference are handled together:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from xgboost import XGBClassifier
import joblib
features = ['age', 'policy_sales_channel', 'gender', 'previously_insured',
'vehicle_age', 'vehicle_damage']
pipeline = Pipeline([
("scaler", StandardScaler()),
("model", XGBClassifier(n_estimators=100, max_depth=4, random_state=42))
])
pipeline.fit(X_train[features], y_train)
joblib.dump(pipeline, "models/final_model.pkl")
One file, one artifact—makes deployment clean!
Streamlit Dashboard for Real-Time Use
We use Streamlit to:
Upload a CSV file or input data manually
Show predictions
Visualize fraud distribution
import streamlit as st
import joblib
import pandas as pd
model = joblib.load("../models/final_model.pkl")
user_input = {...} # collected via st.selectbox, etc.
df = pd.DataFrame([user_input])
pred = model.predict(df)
You get a downloadable fraud probability CSV and a pie chart.
Testing with PyTest
A great project is incomplete without tests:
def test_model_prediction():
model = joblib.load("models/final_model.pkl")
X_sample = pd.DataFrame([{...}])
pred = model.predict(X_sample)
assert pred[0] in [0, 1]
Automation via Makefile
install:
pip install -r requirements.txt
train:
python src/train_model.py
dashboard:
streamlit run dashboards/streamlit_dashboard.py
Dockerization (Optional but Pro-Level)
Your Dockerfile:
FROM python:3.12
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["streamlit", "run", "dashboards/streamlit_dashboard.py"]
Build and run:
docker build -t fraud-app .
docker run -p 8501:8501 fraud-app
What's Next? Advanced Enhancements
✅ CI/CD with GitHub Actions
✅ FastAPI backend for ML inference
✅ Model versioning with MLflow
✅ Data pipeline orchestration with Prefect/Airflow
✅ Feature Store integration
✅ Monitor predictions in production
Conclusion
You’ve now built a complete, production-ready ML system:
Clean code
Modular architecture
Tested and deployed with UI
This is the kind of project that grabs attention on your resume, portfolio, or even FAANG interviews.
Source Code
Everything is available on GitHub:
👉 https://github.com/SANJAYRAM-DS/fraud-detection-end2end
Linked-In
👉 www.linkedin.com/in/sanjayram-data
If you liked this, follow me on Hashnode and stay tuned for more industry-level ML projects!
Subscribe to my newsletter
Read articles from Sanjayram directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
