Cleaning Digits with ML: A Journey Through Chapter 3 of Hands-On ML

If you're diving into the world of machine learning, the MNIST dataset is often your rite of passage. It's a set of 70,000 grayscale images of handwritten digits (0–9), used for classification tasks.
In Chapter 3 of Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow, I worked through building and evaluating digit classifiers. Here's what I did:
Loading and Exploring the Dataset
Using Scikit-Learn’s fetch_openml, I fetched the dataset and visualised a few samples using matplotlib. Each image is a 784-dimensional vector (28x28 pixels).
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1, as_frame=False)
Building Classifiers
I explored:
Binary classification (e.g., detecting the digit 5)
Multiclass classification (0–9)
Multilabel classification (multiple true labels per instance)
Multioutput classification (predicting pixel intensities of denoised images)
Denoising Images with KNN
By adding random noise to images and training a model to reconstruct the original, I built a basic multioutput system to clean up noisy digits.
import numpy as np
X_train_mod = X_train + np.random.randint(0, 100, (len(X_train), 784))
Insights
Confusion matrices revealed where models struggle (e.g., 5s and 3s often get confused).
Feature scaling (StandardScaler) improved model performance.
OneVsOneClassifier was used for multiclass classification with Support Vector Machines (SVMs).
This notebook helped me gain a deep understanding of classification pipelines. Check out my code and try it yourself!
GitHub: Khushhiii08/mnist-ch3-notebook
Subscribe to my newsletter
Read articles from Khushi Rawat directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
