My First Data Science Project: Analyzing Titanic Survival Data with Python

My First Data Science Project: Analyzing Titanic Survival Data with Python

Hi!
Welcome to my very first mini data science project blog!
In this post, I’ll walk you through how I used Python, Pandas, and Seaborn to explore the famous Titanic dataset — a classic beginner-friendly dataset used in data science learning.


🚢 What is the Titanic Dataset?

The Titanic dataset contains data about the passengers on board the RMS Titanic — a ship that tragically sank in 1912.
The dataset includes details like:

  • Passenger age, gender, class

  • Ticket fare

  • Whether they survived or not

Our goal is to explore the data and uncover patterns — such as:
"Who had the best chance of survival?"


📥 Step 1: Import Libraries and Load Data

We’ll use Pandas and Seaborn for data analysis and visualization.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load Titanic dataset
df = sns.load_dataset('titanic')
df.head()

🧾 Step 2: Explore the Data

Let’s check the structure and look for missing values.

df.info()
df.isnull().sum()

Some initial observations:

  • Columns like age, embarked, and deck have missing values.

  • The survived column is our target — 1 means survived, 0 means not.


🧹 Step 3: Data Cleaning

Let’s handle some missing data and keep only useful columns.

# Drop columns with many missing values
df.drop(['deck', 'embark_town', 'alive'], axis=1, inplace=True)

# Fill missing age with median
df['age'].fillna(df['age'].median(), inplace=True)

# Drop rows with any remaining nulls
df.dropna(inplace=True)

📊 Step 4: Data Visualization

Let’s find some interesting insights.

1. Survival Count

sns.countplot(x='survived', data=df)
plt.title('Survival Count')
plt.show()

2. Survival by Gender

sns.countplot(x='sex', hue='survived', data=df)
plt.title('Survival by Gender')
plt.show()

3. Survival by Class

sns.countplot(x='pclass', hue='survived', data=df)
plt.title('Survival by Passenger Class')
plt.show()

4. Age Distribution

sns.histplot(data=df, x='age', bins=30, kde=True)
plt.title('Age Distribution of Passengers')
plt.show()

📈 Step 5: What Did I Learn?

🔍 Insights:

  • More women survived than men.

  • First-class passengers had a higher survival rate.

  • Younger passengers had better chances.

This small project helped me understand:

  • How to load and clean data

  • How to find patterns visually

  • How real-world data can tell powerful stories!


🧠 What’s Next?

In my next blog, I plan to build a basic machine learning model using this Titanic dataset — to actually predict survival!
Step by step, I’ll keep growing my data science skills, and I hope you’ll follow along.

Thanks for reading 💛
Feel free to try this project yourself and share your results!


— Farsana | Data Science Intern | Python + Pandas + Curiosity 🚀


1
Subscribe to my newsletter

Read articles from Farsana Thasnem PA directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Farsana Thasnem PA
Farsana Thasnem PA

Aspiring Data Scientist | Physics Graduate | Passionate about Machine Learning, Python, and Data Storytelling. Sharing my journey, projects, and learnings in the world of data science.