Pandas Data Analysis


This week, I dove deep into the world of data analysis using Pandas — one of the most powerful and beginner-friendly Python libraries for working with structured data. I didn’t aim to master the syntax or teach anyone else, but instead to understand how real-world data "feels" when you work with it, and to develop some raw insights through exploration.
What I Built
I worked on three real-world projects, each using public Kaggle datasets:
1) IMDb Dataset Analysis
→ Extracted top-rated movies, genre trends, decade-wise analysis, and director/actor breakdowns.
2) Netflix Dataset Analysis
Explored global content distribution, genre frequency, release timelines, and show vs. movie split.
3) Steam Games Insight
Analyzed user ratings, game popularity, release patterns, and pricing strategies in the gaming industry.
Resources I Used
Stack Overflow (a lot!)
Official Pandas documentation
Problems I Faced
Data Cleaning was messy. Each dataset had quirks — missing values, mixed data types, duplicate rows — that slowed me down at first.
Column naming conventions differed, making it tough to generalize across projects.
Struggled to visualize relationships early on before learning how to properly structure
.groupby()
and pivot tables.Analysis paralysis: I initially tried to answer too many questions. Narrowing my focus made things much clearer.
Key Observations
Pandas is extremely intuitive once you stop thinking like a coder and start thinking like a data analyst.
Small data decisions (like how to handle nulls) shape the entire outcome of a project.
Visualization and storytelling should start earlier in the project, not just after analysis is complete.
Real-world datasets are rarely clean. Handling that mess is the skill.
Subscribe to my newsletter
Read articles from Kartik Sharma directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
