https://github.com/ahamadc/Supply-Chain-Optimisation

This project is an end-to-end solution that leverages MySQL, Python and Power BI to tackle a real-world supply chain problem. Using a dataset from Kaggle that includes various tables with historical orders. I aim to assign optimal routes for orders while minimising freight and warehouse costs under various constraints. The project will showcase an end-to-end pipeline.

The ingestion and preparation phase involves cleaning, transforming and validating the raw data using Python before it can be fed into MySQL for further analysis. Like dicing an onion before using it in a recipe.

Project Workflow

Data exploration and cleaning using Python (Pandas)
Data upload and schema creation in MySQL
Querying and joining tables using SQL
Cost and route optimization logic (Python or SQL)
Final visualizations and insights using Power BI

Understanding the Dataset

Before diving into cleaning the data, it’s important to understand the raw data. I explored each table by loading it into Pandas to look at the structure, key columns and any initial data quality issues.

The tables were loaded as dataframes, assigned to variables and put into a list.

tables = [order_list, freight_rates, whcosts, whcapacities, products_per_plant, vmi_customers, plant_ports]

for i in tables:
    # Check the first few rows
    print(i.head())

    # Get a concise summary of the DataFrame
    print(i.info())

    # Get basic statistics
    print(i.describe())

The output showed me that all the tables were loaded as dataframes and contain the correct columns. Data types are generally appropriate for the context but there needs to be some changes. The number of rows per table varies from tens to thousands.

Order Date column in Order List table is not set to date

The Order Date column in the Order List table needs to be changed to a datatime data type.

order_list = tables[0]
#change data type of Order Date column to date
order_list['Order Date'] = pd.to_datetime(order_list['Order Date'], format='%d/%m/%Y')
#check the data types of columns to see if change was made
print(order_list.info())

Some numeric columns in Freight Rates table contain non-numeric characters

First I iterated through each character in each item in those specific columns to identify which non-numeric characters are in present and need to be handled.

unique_non_numeric = set()

for item in freight_rates['rate']:
    s = str(item)
    for char in s:
        if not char.isdigit():
            unique_non_numeric.add(char)

print("Unique non-numeric characters:", unique_non_numeric)

The result showed that there were $ signs, commas and spaces in the values that prevented them from being recognised as float values. This was then fixed:

freight_rates['max_wgh_qty'] = freight_rates['max_wgh_qty'].astype(str).replace({',': '', ' ': ''}, regex = True).astype(float)
freight_rates['minimum cost'] = freight_rates['minimum cost'].astype(str).replace({',': '', ' ': '', r'\$': ''}, regex = True).astype(float)
freight_rates['rate'] = freight_rates['rate'].astype(str).replace({',': '', ' ': '', r'\$': ''}, regex = True).astype(float)

Check null values in WhCosts table

whcosts = tables[2]

print(whcosts[whcosts['WH'].isnull()])

The row with a null value in the WH column has null values throughout the row so that row can be removed.

    whcosts = whcosts.dropna(subset=['WH'])

Plant Ports table has unnamed columns that aren’t meant to be there

First I need to check what columns are in the table

print(plant_ports.columns)

The columns we need are Plant Code and Port.

plant_ports = plant_ports[['Plant Code', 'Port']]

Confirming naming standards with Plant Code

print(order_list['Plant Code'].unique())
print(products_per_plant['Plant Code'].unique())

The final step is to save the cleaned dataframes to new CSV files.

Next Steps:

Load cleaned data into MySQL
Build SQL queries for joining and calculating optimal assignments
Visualise insights in Power BI

A Supply Chain Analytics Journey: Data Ingestion and Preparation with Python