Preliminary Review of Retail Dataset on Kaggle: Insights for Segmentation, Customer Analytics, and Clustering Across Global Markets

Mohammed TaliatMohammed Taliat
2 min read

Introduction

This report provides an initial review of the sales dataset which contains detailed records of orders for motorcycles and cars with information on the quantities ordered, prices, sales, order dates, customer details, and geographic distribution between 2003 and 2005. The dataset represents transactions from customers across multiple territories such as North America (NA), Europe, Middle East, and Africa (EMEA), Asia-Pacific (APAC), and Japan. The purpose of this review is to get familiar with the dataset, explore it, and identify initial insights needed for further analysis.

Observations

Dataset Familiarization

A cursory look at the dataset reveals that there exists a comprehensive structure with key variables such as order number, quantity ordered, price per item, sales amount, order status, product line, customer details, and geographic information. The data also includes numerical and categorical types, with orders categorized by product lines like Motorcycles and Classic Cars, and deal sizes ranging from small to large. In the dataset, the some of the numerical variables identified are SALES, QTR_ID, MONTH_ID, YEAR_ID, and MSRP, while the categorical variables are ORDERDATE, STATUS, PRODUCTLINE, PRODUCTCODE, CUSTOMERNAME, and DEALSIZE.

Initial Data Exploration

A review of the dataset reveals that the orders are distributed across multiple years with varying quantities and prices. There exists only two main product categories, Motorcycles and Classic Cars. In addition, customers are located in various regions such as North America, Europe, Middle East, and Africa (EMEA), Asia-Pacific (APAC), and Japan. Furthermore, it was discovered that the orders are categorized into different deal sizes with different status for the orders categorized into different stages such as Shipped, Cancelled, On Hold, Disputed, and In Process.

Initial Insights

The dataset shows that sales occurred across different product lines from 2003 to 2005, with variations in sales patterns by region. The dataset includes a mix of small, medium, and large deals, suggesting diverse customer segments with different purchasing capacities.

Conclusion

This initial review of the sales dataset provides a foundational understanding of its structure, key variables, and observable patterns. The preliminary insights identified in this report will guide deeper analysis to answer specific business questions and inform strategic decisions. Further analysis is needed to identify the product line with the highest sales across the years, whether there exist seasonal patterns that influence sales in different regions, and consumer preferences.

0
Subscribe to my newsletter

Read articles from Mohammed Taliat directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Mohammed Taliat
Mohammed Taliat

I transitioned from a research analyst to a data analyst, leveraging my analytical skills to delve into data-driven insights. Currently, I am enhancing my expertise through a prestigious scholarship with HNG. This journey reflects my commitment to continuous learning and professional growth in data analysis.