#2 Cleaning Raw Data for BI Analysis

kyuwon kangkyuwon kang
2 min read

After importing the raw CSV files into PostgreSQL, the first thing I tacked was cleaning and standardizing the datasets to make them analysis-ready.

What kind of cleaning?

It started with the basics:

  • Fixing blank or inconsistent values

  • Converting dates from varchar(50) to actual DATE format

  • Ensuring numeric fields (like amount, balance, price) used the NUMERIC type for accurate calculations

  • Changing phone numbers from INTEGER to TEXT to avoid overflow issues (that actually happened 🥲)

A deeper look at ALTER TABLE

to make these changes, I used PostgreSQL’s ALTER TABLE command; one of the most useful tools for refining your database schema after initial data import.

If you’re new to ALTER TABLE, or want to see how I used it to reshape my tables safely and efficiently,

Check out my quick guide on ALTER TABLE!

Want yo see the full cleaning logic?

I’ve uploaded the full SQL script used in this step to Github.

It includes

  • SQL code to clean invalid date values

  • Conversion of data types with USING

  • Structure of all final cleaned tables

View the cleaning script & data dictionary on Github

This step matters because..

Before jumping into any kind of segmengation, dashboarding, or modeling,
CLEANING = STRONG FOUNDATION!

This might not be the most glamorous part of the BI workflow, but it’s definitely one of the most critical.

Now the data is clean, it’s finally ready for segmentation and customer insights.

0
Subscribe to my newsletter

Read articles from kyuwon kang directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

kyuwon kang
kyuwon kang