Following my Chapter 1 post “Kicking Off My ML Journey with Aurélien Géron’s Book – Chapter 1: The Machine Learning Landscape”, where I laid the groundwork for understanding what Machine Learning is and the different learning paradigms, I’ve now completed Chapter 2, which marks the first real hands-on project in the book.

Why I’m Reading This Book

I started this book to go beyond surface-level ML and deeply understand Machine Learning, Deep Learning, and eventually Reinforcement Learning. So far, I’ve completed:

The Machine Learning A-Z™ 2025 course
My college-level Intro to ML
A club project on YOLOv8, which I’ll blog about soon

This book is my next serious step into the field, and Chapter 2 didn’t disappoint.

The Housing Price Prediction Project

In this chapter, I implemented an end-to-end ML project using California housing data. It’s a complete cycle—from loading the data to training, fine-tuning, and evaluating the model.

Major Steps Covered:

Frame the Problem
- Predict median housing prices in California districts (regression problem)
Load & Explore the Data
- Used pandas, matplotlib, and seaborn to analyse distributions, check for missing values, and identify feature correlations
Create a Test Set
- Learned the importance of consistent splits using train_test_split and stratified sampling
Data Cleaning & Preparation
- Handled missing values, categorical features, and feature scaling
- Built data pipelines using scikit-learn’s Pipeline and ColumnTransformer
Select and Train a Model
- Trained Linear Regression, Decision Tree, and Random Forest models
- Used cross-validation to evaluate and compare models
Fine-Tune the Model
- Used Grid Search and Randomized Search for hyperparameter tuning
Evaluate on Test Set
- Final model was tested and ready for deployment

Key Learnings

Data pipelines are game-changers. Automating preprocessing ensures cleaner, reusable workflows.
Stratified sampling is essential when dealing with imbalanced data distributions.
Cross-validation > simple train/test split — helps reduce variance in model evaluation.
Even basic models like Linear Regression can offer a good starting baseline.

Reflections

Chapter 2 truly felt like building something real. It connected the dots between theory and practice, especially around data handling, model evaluation, and pipeline automation. I found it rewarding to see how each small decision—from how I split the data to how I encoded features—affected final performance.

What’s Next?

I’ll be moving into the chapters on classification, training deep neural networks, and eventually Reinforcement Learning. I also have a separate post coming soon on my YOLOv8 club project that uses computer vision for retail analytics.

If you’re learning ML too or working on similar projects, I’d love to connect!

Chapter 2: End-to-End ML with Housing Data

Table of contents