Pandas Data Analysis

Kartik SharmaKartik Sharma
2 min read

This week, I dove deep into the world of data analysis using Pandas — one of the most powerful and beginner-friendly Python libraries for working with structured data. I didn’t aim to master the syntax or teach anyone else, but instead to understand how real-world data "feels" when you work with it, and to develop some raw insights through exploration.


What I Built

I worked on three real-world projects, each using public Kaggle datasets:

1) IMDb Dataset Analysis

→ Extracted top-rated movies, genre trends, decade-wise analysis, and director/actor breakdowns.

2) Netflix Dataset Analysis

3) Steam Games Insight

Analyzed user ratings, game popularity, release patterns, and pricing strategies in the gaming industry.

Resources I Used


Problems I Faced

  • Data Cleaning was messy. Each dataset had quirks — missing values, mixed data types, duplicate rows — that slowed me down at first.

  • Column naming conventions differed, making it tough to generalize across projects.

  • Struggled to visualize relationships early on before learning how to properly structure .groupby() and pivot tables.

  • Analysis paralysis: I initially tried to answer too many questions. Narrowing my focus made things much clearer.


Key Observations

  • Pandas is extremely intuitive once you stop thinking like a coder and start thinking like a data analyst.

  • Small data decisions (like how to handle nulls) shape the entire outcome of a project.

  • Visualization and storytelling should start earlier in the project, not just after analysis is complete.

  • Real-world datasets are rarely clean. Handling that mess is the skill.


🔗
Lets Connect on LinkedIn
0
Subscribe to my newsletter

Read articles from Kartik Sharma directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Kartik Sharma
Kartik Sharma