Exploring the Retail Sales Kaggle Dataset

Pauline B.Pauline B.
3 min read

Hello everyone! πŸ‘‹ I’m excited to share insights from my first task in the HNG11 internship Data Analytics track. Let’s review the dataset and see what we can discover!

Introduction

HNG internship is a fast-paced bootcamp for learning digital skills such as Software Development, Data Analytics, Software Testing, DevOps, and Design, to name a few. It also provides an avenue to network, collaborate with other techies, and access exclusive jobs via the HNG premium network. It also offers networking opportunities, collaboration with fellow tech enthusiasts, and access to exclusive job openings through the HNG premium network.

This task involves analyzing a dataset consisting of 2823 rows and 25 columns, containing details about individual orders, customer information, product data, and sales. The goal is to understand the dataset's structure and derive initial insights from a preliminary exploration.

Observations

Diverse Data Types πŸ“

  • The dataset includes 25 columns: 16 categorical (e.g., order status, product line, country) and 9 numerical (e.g., order quantity, price, year).

Missing Values ⚠️

  • Several columns, such as additional addresses, state information, postal codes, and territory details, have missing values, which may limit some aspects of the analysis.

      # Identify the columns with missing data
      null_count = sales_df.isnull().sum()
    
      # Filter columns with missing values greater than 0
      null_count[null_count > 0]
    
      # Result
      ADDRESSLINE2 2521
      STATE 1486
      POSTALCODE 76
      TERRITORY 1074
    

Varied Order Sizes and Prices πŸ’Έ

  • Order sizes range from 6 to 97 items, with an average of 35 items per order.

  • Item prices vary from $26.88 to $100, averaging about $83.66.

Sales Figures πŸ“Š

  • Sales amounts range from $482.13 to $14,082.80, with an average sale of approximately $3,553.89.

  • Manufacturer's suggested retail prices (MSRP) vary from $33 to $214, averaging around $100.

snippet generated from describing the dataset

Product Diversity πŸ›οΈ

  • The dataset includes multiple product types, providing an opportunity to analyze sales performance across different categories.

Product Performance

Seasonal Trends πŸ“…

  • The dataset covers sales throughout the year, with a notable concentration around the middle of the year.

  • Data mainly pertains to the early 2000s, specifically between 2003 and 2005, with most entries around 2003.

  • temporal trend

Conclusion

The initial review of the sample sales dataset has highlighted several key areas for further exploration. A detailed analysis should focus on sales performance by product line, periods, and geographical distribution. Addressing missing values and converting data types accurately will be essential for a more precise analysis. Continued investigation will yield deeper insights into the sales data, helping to identify significant trends and patterns.

Potential Areas for Further Analysis

  1. Sales Performance:

    • Examine trends in sales figures over various periods (quarterly, monthly, and yearly) to understand sales distribution and trends.
  2. Product Analysis:

    • Investigate the relationships between product types, prices, and sales figures.
  3. Customer Segmentation:

    • Identify customer segments based on order behavior (quantity, frequency) and demographics (location).
  4. Data Cleaning and Preprocessing:

    • Resolve missing values, convert appropriate columns to numerical data types, and properly format the ORDERDATE column for time-series analysis.
  5. Geographical Insights:

    • Utilize the customer location data to conduct a geographical analysis of sales performance.

In conclusion, the HNG11 internship provides a great opportunity to develop data analytics skills and gain practical experience. By exploring and analyzing this dataset, we can uncover valuable insights that can drive business decisions and strategies.

Thanks for reading! 😊

0
Subscribe to my newsletter

Read articles from Pauline B. directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Pauline B.
Pauline B.