Let’s face it: data is rarely clean. Whether you're building dashboards, importing user spreadsheets, or doing analysis, data validation and cleaning are essential skills.

In this blog, you’ll learn how to use Python’s pandas library to:

Identify missing or invalid entries
Clean and transform data
Validate values and types
Prepare data for reliable use

📥 The Problem

Let’s say you get a CSV file from users like this:

csvCopyEditname,email,age,join_date
Alice,alice@example.com,30,2021-07-01
Bob,,27,not-a-date
,carol@example.com,twenty,2021-08-15
David,david@example.com,45,2021-06-25

Looks simple, but it’s full of issues:

Missing names and emails
Non-numeric age
Invalid date formats

🧪 Step 1: Load and Inspect

pythonCopyEditimport pandas as pd

df = pd.read_csv("users.csv")
print(df.info())
print(df.head())

🕵️ Step 2: Detect and Handle Missing Data

pythonCopyEditprint(df.isnull().sum())

# Drop rows with missing critical fields
df = df.dropna(subset=["name", "email"])

Or fill missing values:

pythonCopyEditdf["age"] = df["age"].fillna(0)

🔢 Step 3: Validate and Convert Types

Convert age to integers:

pythonCopyEditdef clean_age(value):
    try:
        return int(value)
    except:
        return None

df["age"] = df["age"].apply(clean_age)
df = df.dropna(subset=["age"])  # Drop rows where age couldn't be fixed

Parse dates:

pythonCopyEditdf["join_date"] = pd.to_datetime(df["join_date"], errors="coerce")
df = df.dropna(subset=["join_date"])

🧹 Step 4: Strip and Standardize

pythonCopyEditdf["name"] = df["name"].str.strip().str.title()
df["email"] = df["email"].str.lower()

✅ Final Cleaned Data

pythonCopyEditprint(df)
df.to_csv("cleaned_users.csv", index=False)

Now you have a clean dataset ready for storage, analysis, or machine learning.

⚠️ Bonus: Flag Invalid Rows Instead of Dropping

pythonCopyEditdf["valid"] = df["age"].apply(lambda x: isinstance(x, (int, float))) & \
              df["join_date"].notnull() & \
              df["name"].notnull() & df["email"].notnull()

You can export valid and invalid entries separately.

🧵 Wrapping Up

Cleaning and validating data is often 80% of the work in data projects. With pandas, you can write powerful and readable data cleanup pipelines that save time and headaches.

Clean Your Messy Data with Pandas