Predicting Titanic Survivors with Machine Learning: A Beginner's Guide


Predicting Titanic Survivors with Machine Learning: A Beginner's Guide
Welcome back to my blog!
In my previous post, I explored the Titanic dataset using Python and visualized survival trends.
Now it’s time to take the next step — building a machine learning model that predicts whether a passenger would survive or not.
This is my first real ML project, and I’ll walk you through it step by step!
⚙️ Step 1: Import Libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
📥 Step 2: Load the Dataset
df = sns.load_dataset('titanic')
df.head()
🧹 Step 3: Data Cleaning
# Drop columns we won’t use
df.drop(['deck', 'embark_town', 'alive', 'who', 'adult_male', 'class'], axis=1, inplace=True)
# Drop rows with missing values
df.dropna(inplace=True)
🔠 Step 4: Encoding Categorical Features
# Convert categorical columns to numeric using Label Encoding
le = LabelEncoder()
df['sex'] = le.fit_transform(df['sex']) # female=0, male=1
df['embarked'] = le.fit_transform(df['embarked']) # S=2, C=0, Q=1
df['alone'] = le.fit_transform(df['alone']) # True=1, False=0
🎯 Step 5: Define Features and Target
X = df.drop('survived', axis=1)
y = df['survived']
🧪 Step 6: Split the Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
🧠 Step 7: Train a Logistic Regression Model
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
✅ Step 8: Evaluate the Model
y_pred = model.predict(X_test)
# Accuracy
print("Accuracy:", accuracy_score(y_test, y_pred))
# Confusion Matrix
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
# Classification Report
print("Classification Report:\n", classification_report(y_test, y_pred))
📌 Sample Output:
Accuracy: 0.82
Confusion Matrix:
[[83 10]
[11 35]]
Classification Report:
precision recall f1-score support
0 0.88 0.89 0.89 93
1 0.78 0.76 0.77 46
🧠 What I Learned
Logistic Regression is a simple but powerful algorithm for binary classification problems like survival prediction.
Encoding and cleaning the data correctly is crucial.
The model achieved over 80% accuracy on unseen data — not bad for a first ML model!
🚀 What’s Next?
In the next blog, I plan to try out a Random Forest classifier and compare results.
I’ll also show how to save the model and make predictions on new data.
Thanks for following my journey!
If you’re learning data science too, let’s grow together.
— Farsana | Aspiring Data Scientist | First ML Project Completed!
Subscribe to my newsletter
Read articles from Farsana Thasnem PA directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Farsana Thasnem PA
Farsana Thasnem PA
Aspiring Data Scientist | Physics Graduate | Passionate about Machine Learning, Python, and Data Storytelling. Sharing my journey, projects, and learnings in the world of data science.