This week, I dove deep into the world of data analysis using Pandas — one of the most powerful and beginner-friendly Python libraries for working with structured data. I didn’t aim to master the syntax or teach anyone else, but instead to understand how real-world data "feels" when you work with it, and to develop some raw insights through exploration.

What I Built

I worked on three real-world projects, each using public Kaggle datasets:

1) IMDb Dataset Analysis

→ Extracted top-rated movies, genre trends, decade-wise analysis, and director/actor breakdowns.

https://github.com/Kartikshard/Pandas-analysis/blob/main/Imdb%20Analysis.ipynb

2) Netflix Dataset Analysis

Explored global content distribution, genre frequency, release timelines, and show vs. movie split.
https://github.com/Kartikshard/Pandas-analysis/blob/main/Netflx_Analysis.ipynb

3) Steam Games Insight

Analyzed user ratings, game popularity, release patterns, and pricing strategies in the gaming industry.

https://github.com/Kartikshard/Pandas-analysis/blob/main/Games.ipynb

Resources I Used

IMDB Dataset of 50K Movie Reviews
Steam Games Dataset
Netflix Movies and TV Shows Dataset
Stack Overflow (a lot!)
Official Pandas documentation

Problems I Faced

Data Cleaning was messy. Each dataset had quirks — missing values, mixed data types, duplicate rows — that slowed me down at first.
Column naming conventions differed, making it tough to generalize across projects.
Struggled to visualize relationships early on before learning how to properly structure .groupby() and pivot tables.
Analysis paralysis: I initially tried to answer too many questions. Narrowing my focus made things much clearer.

Key Observations

Pandas is extremely intuitive once you stop thinking like a coder and start thinking like a data analyst.
Small data decisions (like how to handle nulls) shape the entire outcome of a project.
Visualization and storytelling should start earlier in the project, not just after analysis is complete.
Real-world datasets are rarely clean. Handling that mess is the skill.

🔗

Lets Connect on LinkedIn

↙

Check out the GitHub repo

Pandas Data Analysis