Dimensional Data Modeling for an E-Commerce Store
Introduction: Understanding the Business Process
In an online retail environment like Walmart, there are multiple critical processes involved in fulfilling customer orders. These include capturing customer details, managing product inventories, processing orders, applying promotions, and shipping items from fulfillment centers to customers. The business must track each of these processes efficiently, particularly when the same order contains items stored across different fulfillment centers, requiring centralized shipment before final delivery.
The key objectives of this model are:
Accurately track customer orders and line-item details.
Monitor fulfillment and inventory levels across multiple centers.
Handle the application of promotions and discounts on both the order and line-item levels.
Maintain historical accuracy with data changes, such as product price fluctuations and customer address changes.
Why a Dimensional Model is Ideal for Fast Analytics
Dimensional modeling is an approach used in data warehousing to simplify complex data for fast, intuitive analytics. It relies on fact tables to store measurable events (like orders) and dimension tables to store descriptive context (like customer details, product info, and promotions). This structure allows business intelligence teams to run complex queries quickly, answer critical business questions, and track KPIs over time.
In this case, the dimensional model allows for:
Easy querying of sales data, customer activity, and product stock across multiple fulfillment centers.
Historical tracking of data changes through Slowly Changing Dimensions (SCDs Type 2), ensuring accurate and auditable reporting.
Scalability to handle large transaction volumes typical in e-commerce platforms while maintaining query speed for real-time insights.
Table Schemas in PostgreSQL
Below are the table definitions for the fact and dimension tables in PostgreSQL. We’ve incorporated Slowly Changing Dimensions (SCDs Type 2) for tracking changes over time, particularly in the customer and product tables.
Fact Table: fact_orders
CREATE TABLE fact_orders (
order_id SERIAL PRIMARY KEY,
customer_id INT REFERENCES dim_customers(customer_id),
order_date DATE,
total_order_amount DECIMAL(10, 2),
original_order_amount DECIMAL(10, 2),
shipping_cost DECIMAL(10, 2),
promotion_id INT REFERENCES dim_promotions(promotion_id),
fulfillment_center_id INT REFERENCES dim_fulfillment_centers(fulfillment_center_id),
order_status_id INT REFERENCES dim_order_status(order_status_id)
);
Fact Table: fact_order_line_items
CREATE TABLE fact_order_line_items (
line_item_id SERIAL PRIMARY KEY,
order_id INT REFERENCES fact_orders(order_id),
product_id INT REFERENCES dim_products(product_id),
quantity INT,
unit_price_at_purchase DECIMAL(10, 2),
discount_applied DECIMAL(10, 2),
line_total_amount DECIMAL(10, 2)
);
Fact Table: fact_inventory_levels
CREATE TABLE fact_inventory_levels (
product_id INT REFERENCES dim_products(product_id),
fulfillment_center_id INT REFERENCES dim_fulfillment_centers(fulfillment_center_id),
inventory_date DATE,
stock_quantity INT,
PRIMARY KEY (product_id, fulfillment_center_id, inventory_date)
);
Dimension Table: dim_customers (SCD Type 2)
CREATE TABLE dim_customers (
customer_id SERIAL PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(100),
date_of_birth DATE,
gender VARCHAR(10),
street_address VARCHAR(255),
city VARCHAR(50),
state_province VARCHAR(50),
country VARCHAR(50),
postal_code VARCHAR(20),
effective_date DATE,
expiry_date DATE DEFAULT NULL,
current_flag BOOLEAN DEFAULT TRUE
);
Dimension Table: dim_products (SCD Type 2)
CREATE TABLE dim_products (
product_id SERIAL PRIMARY KEY,
product_name VARCHAR(255),
product_category VARCHAR(100),
brand VARCHAR(100),
price DECIMAL(10, 2),
effective_date DATE,
expiry_date DATE DEFAULT NULL,
current_flag BOOLEAN DEFAULT TRUE
);
Dimension Table: dim_promotions
CREATE TABLE dim_promotions (
promotion_id SERIAL PRIMARY KEY,
promotion_name VARCHAR(255),
discount_percentage DECIMAL(5, 2),
start_date DATE,
end_date DATE
);
Dimension Table: dim_fulfillment_centers
CREATE TABLE dim_fulfillment_centers (
fulfillment_center_id SERIAL PRIMARY KEY,
center_name VARCHAR(100),
city VARCHAR(50),
state_province VARCHAR(50),
country VARCHAR(50),
postal_code VARCHAR(20)
);
Dimension Table: dim_order_status
CREATE TABLE dim_order_status (
order_status_id SERIAL PRIMARY KEY,
status_name VARCHAR(50)
);
Reasoning for Table Choices
fact_orders and fact_order_line_items: These tables are central to tracking all order activities, including promotions and shipment details. Each order may have multiple line items, making it necessary to split the order details from the item-level details.
fact_inventory_levels: This table captures the inventory levels of products at different fulfillment centers, allowing the company to track stock levels for each product on a daily basis. This is crucial for efficient inventory management and customer order fulfillment.
dim_customers: Merging the customer and address data simplifies queries while maintaining historical data on address changes (handled via SCD Type 2).
dim_products: This dimension uses SCD Type 2 to track price changes over time. We ensure historical accuracy when reporting on sales at a specific time.
dim_promotions: Adding a promotions table allows tracking of discounts applied to both orders and line items. This is critical for understanding the effectiveness of promotional campaigns.
dim_fulfillment_centers: Given that all orders must be routed through a single fulfillment center, this table captures the necessary fulfillment data. Orders across multiple centers are aggregated at one center before being shipped to the customer.
dim_order_status: Tracks the status of orders from pending to shipped or canceled, ensuring visibility into the order lifecycle.
ERD for Dimensional Model
Conclusion
This dimensional model is tailored for e-commerce operations like an online Walmart store. The combination of fact tables for orders, line items, and inventory levels, along with the dimension tables for customers, products, promotions, and fulfillment centers, ensures fast and efficient querying. By using SCD Type 2, we capture historical changes, enabling accurate trend analysis and KPI tracking.
This model supports a wide range of business needs, from tracking customer orders and inventory levels to analyzing promotional effectiveness. The flexibility and speed of dimensional modeling make it ideal for the fast-paced analytics environment of large-scale e-commerce.
Thanks for reading!
Photo Credit: By AS Photography via Pexels
Subscribe to my newsletter
Read articles from Analytics Engineering Digest directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Analytics Engineering Digest
Analytics Engineering Digest
Analytics Engineer Digest is dedicated to sharing expert insights and practical knowledge in the field of analytics engineering. With a focus on data modeling, data warehousing, ETL/ELT processes, and the creation of high-quality datasets, this blog serves as a resource for building scalable data architectures that drive business success. Readers can explore real-world data models, interview preparation tips, and best practices for crafting metrics, dashboards, and automated workflows using SQL, Python, and AWS tools. Analytics Engineer Digest is committed to empowering data professionals to elevate their skills and unlock the full potential of their analytics projects.