If you're diving into the world of machine learning, the MNIST dataset is often your rite of passage. It's a set of 70,000 grayscale images of handwritten digits (0–9), used for classification tasks.

In Chapter 3 of Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow, I worked through building and evaluating digit classifiers. Here's what I did:

Loading and Exploring the Dataset

Using Scikit-Learn’s fetch_openml, I fetched the dataset and visualised a few samples using matplotlib. Each image is a 784-dimensional vector (28x28 pixels).

from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1, as_frame=False)

Building Classifiers

I explored:

Binary classification (e.g., detecting the digit 5)
Multiclass classification (0–9)
Multilabel classification (multiple true labels per instance)
Multioutput classification (predicting pixel intensities of denoised images)

Denoising Images with KNN

By adding random noise to images and training a model to reconstruct the original, I built a basic multioutput system to clean up noisy digits.

import numpy as np
X_train_mod = X_train + np.random.randint(0, 100, (len(X_train), 784))

Insights

Confusion matrices revealed where models struggle (e.g., 5s and 3s often get confused).
Feature scaling (StandardScaler) improved model performance.
OneVsOneClassifier was used for multiclass classification with Support Vector Machines (SVMs).

This notebook helped me gain a deep understanding of classification pipelines. Check out my code and try it yourself!

GitHub: Khushhiii08/mnist-ch3-notebook

Cleaning Digits with ML: A Journey Through Chapter 3 of Hands-On ML

Table of contents

Loading and Exploring the Dataset

Building Classifiers

Denoising Images with KNN

Insights

Subscribe to my newsletter

Khushi Rawat

Khushi Rawat