Cleaning Digits with ML: A Journey Through Chapter 3 of Hands-On ML

Khushi RawatKhushi Rawat
2 min read

If you're diving into the world of machine learning, the MNIST dataset is often your rite of passage. It's a set of 70,000 grayscale images of handwritten digits (0–9), used for classification tasks.

In Chapter 3 of Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow, I worked through building and evaluating digit classifiers. Here's what I did:

Loading and Exploring the Dataset

Using Scikit-Learn’s fetch_openml, I fetched the dataset and visualised a few samples using matplotlib. Each image is a 784-dimensional vector (28x28 pixels).

from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1, as_frame=False)

Building Classifiers

I explored:

  • Binary classification (e.g., detecting the digit 5)

  • Multiclass classification (0–9)

  • Multilabel classification (multiple true labels per instance)

  • Multioutput classification (predicting pixel intensities of denoised images)

Denoising Images with KNN

By adding random noise to images and training a model to reconstruct the original, I built a basic multioutput system to clean up noisy digits.

import numpy as np
X_train_mod = X_train + np.random.randint(0, 100, (len(X_train), 784))

Insights

  • Confusion matrices revealed where models struggle (e.g., 5s and 3s often get confused).

  • Feature scaling (StandardScaler) improved model performance.

  • OneVsOneClassifier was used for multiclass classification with Support Vector Machines (SVMs).

This notebook helped me gain a deep understanding of classification pipelines. Check out my code and try it yourself!

GitHub: Khushhiii08/mnist-ch3-notebook

0
Subscribe to my newsletter

Read articles from Khushi Rawat directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Khushi Rawat
Khushi Rawat