#2 Cleaning Raw Data for BI Analysis

After importing the raw CSV files into PostgreSQL, the first thing I tacked was cleaning and standardizing the datasets to make them analysis-ready.
What kind of cleaning?
It started with the basics:
Fixing blank or inconsistent values
Converting dates from
varchar(50)
to actualDATE
formatEnsuring numeric fields (like amount, balance, price) used the
NUMERIC
type for accurate calculationsChanging phone numbers from INTEGER to TEXT to avoid overflow issues (that actually happened 🥲)
A deeper look at ALTER TABLE
to make these changes, I used PostgreSQL’s ALTER TABLE command; one of the most useful tools for refining your database schema after initial data import.
If you’re new to ALTER TABLE, or want to see how I used it to reshape my tables safely and efficiently,
Check out my quick guide on ALTER TABLE!
Want yo see the full cleaning logic?
I’ve uploaded the full SQL script used in this step to Github.
It includes
SQL code to clean invalid date values
Conversion of data types with
USING
Structure of all final cleaned tables
View the cleaning script & data dictionary on Github
This step matters because..
Before jumping into any kind of segmengation, dashboarding, or modeling,
CLEANING = STRONG FOUNDATION!
This might not be the most glamorous part of the BI workflow, but it’s definitely one of the most critical.
Now the data is clean, it’s finally ready for segmentation and customer insights.
Subscribe to my newsletter
Read articles from kyuwon kang directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
