First Impression: Exploring retail sales data
Introduction
In this data analysis exercise, the dataset under review comprises order details from a retail business. The primary objective of this analysis is to conduct an initial exploration to identify key variables, understand their types, and uncover any immediate trends or anomalies. By doing so, we aim to lay the groundwork for deeper insights and potential recommendations for further analysis.
This exercise hosted by HNG Internship not only enhances our ability to interpret data effectively but also sharpens our skills in presenting findings coherently through a brief technical report.
The Dataset
After briefly inspecting our dataset I found out the following about its content, The data set consists of 25 columns and 2823 entries.
Assessing the columns I found out that there are some missing values in 4 columns, there are 2521 nulls in ADDRESSLINE2
, 1486 nulls in STATE
, 76 in POSTALCODE
, and 1074 nulls in TERRITORY
.
Now I will be looking at the distributions of the contents within our columns.
Categorical Columns:
ORDERS: 18 unique ORDERNUMBERs and 307 ORDERLINENUMBERs recorded.
STATUS: 6 unique values ('Shipped', 'Disputed', 'In Process', 'Cancelled', 'On Hold', 'Resolved'). The most frequent status is 'Shipped' with 2617 records.
PRODUCTLINE: 7 unique values ('Motorcycles', 'Classic Cars', 'Trucks and Buses', 'Vintage Cars', 'Planes', 'Ships', 'Trains'). Most occurring product line is 'Classic Cars' with 967 entries.
PRODUCTCODE: 109 unique product codes were recorded. Most occurring product code is 'S18_3232' with 52 records.
LOCATION DATA: 73 unique cities, 16 unique states ('STATE'), and 19 unique countries ('COUNTRY'). Most occurring city is 'Madrid' with 304 records. Most occurring state is 'CA' with 416 records. Most occurring country is 'USA' with 1004 records.
TERRITORY: 3 unique territories recorded. Most occurring territory is EMEA with 1407 records.
DEALSIZE: 3 unique deal sizes recorded. Most occurring deal size is medium with 1384 records.
Numerical Columns:
ORDERDATE: Data spans from 2003-01-06 to 2005-05-31, covering just over 2 years.
QUANTITYORDERED: Range from 6 to 97 units ordered. Average quantity ordered is 35 units.
PRICEEACH: Prices range from $26.88 to $100.00 per unit. Average price per unit is $83.66. Half of the products are priced below $95.70.
SALES: Sales range from $482.13 to $14082.80. Average sale amount is $3553.89.
Sales Trend Analysis
The dataset shows a peak sale of $4.7 million in 2014, with a seasonal spike observed around November in both 2013 and 2014. Further exploration into these seasonal trends could reveal opportunities for maximizing sales in 2015 and beyond.
The monthly sales values for 2005 indicate a good run compared to same periods for the previous year, a new yearly sales high could be expected.
Conclusion
In this initial exploration of the retail order dataset, we have identified key variables such as order status, product lines, and geographical distributions. The analysis has uncovered significant trends, including a peak sale of $4.7 million in 2014 and seasonal spikes in November 2013 and 2014. These findings lay the groundwork for deeper investigation into the factors driving these trends, potentially offering insights to optimize sales strategies and capitalize on seasonal opportunities in future analyses. This exercise highlights the importance of data-driven insights in enhancing decision-making processes and underscores the value of continued exploration to uncover actionable recommendations for business growth.
Thanks for reading through this work!
To participate in the HNG Internship programme click on this link
You can also find out more about their premium packages and networking opportunities here
Subscribe to my newsletter
Read articles from Favour Ewoh directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by