Clean Your Messy Data with Pandas


Let’s face it: data is rarely clean. Whether you're building dashboards, importing user spreadsheets, or doing analysis, data validation and cleaning are essential skills.
In this blog, you’ll learn how to use Python’s pandas
library to:
Identify missing or invalid entries
Clean and transform data
Validate values and types
Prepare data for reliable use
📥 The Problem
Let’s say you get a CSV file from users like this:
csvCopyEditname,email,age,join_date
Alice,alice@example.com,30,2021-07-01
Bob,,27,not-a-date
,carol@example.com,twenty,2021-08-15
David,david@example.com,45,2021-06-25
Looks simple, but it’s full of issues:
Missing names and emails
Non-numeric age
Invalid date formats
🧪 Step 1: Load and Inspect
pythonCopyEditimport pandas as pd
df = pd.read_csv("users.csv")
print(df.info())
print(df.head())
🕵️ Step 2: Detect and Handle Missing Data
pythonCopyEditprint(df.isnull().sum())
# Drop rows with missing critical fields
df = df.dropna(subset=["name", "email"])
Or fill missing values:
pythonCopyEditdf["age"] = df["age"].fillna(0)
🔢 Step 3: Validate and Convert Types
Convert age
to integers:
pythonCopyEditdef clean_age(value):
try:
return int(value)
except:
return None
df["age"] = df["age"].apply(clean_age)
df = df.dropna(subset=["age"]) # Drop rows where age couldn't be fixed
Parse dates:
pythonCopyEditdf["join_date"] = pd.to_datetime(df["join_date"], errors="coerce")
df = df.dropna(subset=["join_date"])
🧹 Step 4: Strip and Standardize
pythonCopyEditdf["name"] = df["name"].str.strip().str.title()
df["email"] = df["email"].str.lower()
✅ Final Cleaned Data
pythonCopyEditprint(df)
df.to_csv("cleaned_users.csv", index=False)
Now you have a clean dataset ready for storage, analysis, or machine learning.
⚠️ Bonus: Flag Invalid Rows Instead of Dropping
pythonCopyEditdf["valid"] = df["age"].apply(lambda x: isinstance(x, (int, float))) & \
df["join_date"].notnull() & \
df["name"].notnull() & df["email"].notnull()
You can export valid and invalid entries separately.
🧵 Wrapping Up
Cleaning and validating data is often 80% of the work in data projects. With pandas
, you can write powerful and readable data cleanup pipelines that save time and headaches.
Subscribe to my newsletter
Read articles from Ashraful Islam Leon directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Ashraful Islam Leon
Ashraful Islam Leon
Passionate Software Developer | Crafting clean code and elegant solutions