Handwritten Digit Recognizer


1. Introduction to Digit Classification
In today's digital world, the ability of machines to understand handwritten digits is extremely valuable. From postal address reading to banking check verification, digit classification plays a crucial role. This project focuses on building a machine learning model that can automatically recognize handwritten digits with high accuracy.
2. Overview of the Project
The main goal of this project was to build a simple yet effective neural network model that can classify handwritten digits from images. We used Python as the programming language and popular libraries like TensorFlow and Keras to develop, train, and evaluate the model.
3. Dataset Used
For this project, we used the MNIST dataset, a benchmark dataset in the world of machine learning. It consists of 70,000 grayscale images of handwritten digits (0 to 9) — 60,000 for training and 10,000 for testing. Each image is 28x28 pixels in size.
Sample images from the dataset:
6➔
0 ➔
The MNIST dataset is widely used because it is simple yet challenging enough to test a wide range of models.
4. Data Preprocessing
Before feeding the data into the model, some preprocessing steps were applied:
Normalization: Pixel values were scaled from
[0, 255]
to[0, 1]
. This speeds up training and helps the model converge faster.Reshaping: Images were reshaped to match the input requirements of the neural network.
One-Hot Encoding: Labels were converted into a binary class matrix (for example, the label
3
becomes[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
).
Preprocessing ensures that the model receives clean and standardized input for optimal performance.
5. Building the Model
🧠 Artificial Neural Network (ANN) Architecture
Our ANN model architecture is designed to be simple yet highly effective for handwritten digit classification. It consists of the following layers:
Input Layer:
Accepts 784 features corresponding to the flattened 28×28 pixel grayscale images.Hidden Layers:
A Dense (fully connected) layer with 128 neurons, activated using the ReLU function to introduce non-linearity.
A Dropout layer is incorporated to reduce overfitting by randomly disabling a fraction of neurons during training.
Output Layer:
A dense layer with 10 neurons, each representing a digit class (0–9), activated by the Softmax function to perform multi-class classification.
We selected this straightforward yet powerful architecture because it strikes an optimal balance between model performance and computational efficiency — making it ideal for fast and accurate predictions.
🧩 Convolutional Neural Network (CNN) Architecture
To further enhance performance, we also implemented a Convolutional Neural Network (CNN), which is better suited for image data:
Input Layer:
Accepts 28×28 pixel grayscale images directly, preserving the 2D spatial structure.Convolutional Layers:
Apply filters (kernels) to detect local patterns like edges, corners, and textures.
Each filter extracts a specific type of feature from the image.
Pooling Layers:
Use Max Pooling to downsample feature maps, reducing spatial dimensions and computational complexity.
Helps retain important features while discarding noise.
Flatten Layer:
- Converts the 2D feature maps into a 1D vector to prepare for fully connected layers.
Fully Connected Layers:
- Dense layers with ReLU activation process the extracted features for final classification.
Output Layer:
- A dense layer with 10 neurons, activated by the Softmax function for multi-class prediction.
CNNs are particularly powerful for image classification because they automatically learn hierarchical representations, capturing spatial relationships between pixels — an ability that ANN models lack when working with flattened data.
6. ⚡ANN vs. CNN: Key Differences
ASPECT | ANN | CNN |
Input Handling | Flattens images into 1D vector | Preserves 2D image structure |
Feature Extraction | Manual | Automatic (via convolutional filters) |
Suitable For | Simple datasets | Complex visual data |
Performance | Good | Superior |
7. Training the Model
The model was trained using the following parameters:
Optimizer: Adam
Loss function: Categorical Crossentropy
Metrics: Accuracy
Epochs: 10
Batch Size: 32
During training, we observed the training and validation accuracy steadily increasing, indicating that the model was learning meaningful patterns from the data.
📉 Training and Validation Loss
During the training phase, the model's loss decreased significantly over time, indicating that it was learning the correct patterns from the data.
A consistent reduction in both training and validation loss demonstrates that the model generalized well without overfitting.
Minor fluctuations in validation loss are expected due to the inherent variance in the validation set.
📈 Training and Validation Accuracy
The accuracy graph shows a rapid improvement in model performance within the first few epochs, eventually stabilizing above 99% for both training and validation datasets.
The small gap between training and validation accuracy indicates a good balance between bias and variance, meaning the model can generalize well on unseen data
8. Model Evaluation
After training, the model achieved impressive performance:
Training Accuracy: ~98%
Test Accuracy: ~97%
We also plotted the loss and accuracy curves, which showed smooth convergence without major signs of overfitting.
A confusion matrix was generated to better understand where the model made mistakes. Most errors occurred between digits that look visually similar, like 4 and 9.
🎯Performance Analysis using ROC Curve
The ROC (Receiver Operating Characteristic) curve for class 0 shows a perfect performance with an AUC (Area Under the Curve) score of 1.00.
This signifies that the model distinguishes class 0 with perfect sensitivity and specificity, which is ideal in classification tasks.
A straight rise to the top left corner reflects minimal false positives and false negatives.
9. Challenges Faced
While the project was mostly straightforward, a few challenges were encountered:
Overfitting: Early models performed well on training data but poorly on testing data. Adding dropout layers and tuning hyperparameters helped address this.
Long Training Time: Although MNIST is small, trying different architectures took time. Using early stopping and a validation split helped speed up experiments.
10. Conclusion and Future Work
In this project, we successfully built a neural network capable of recognizing handwritten digits with high accuracy. It was a great exercise in understanding the fundamentals of deep learning, data preprocessing, and model evaluation.
For future improvements, we could:
Explore more complex architectures like Convolutional Neural Networks (CNNs).
Implement data augmentation techniques to artificially expand the dataset.
Deploy the model in a simple web application for live digit recognition.
11. References
MNIST Dataset
TensorFlow Documentation
Keras Documentation
Various online tutorials and courses
🔗 Access the Full Code
You can find the complete project code, including model building, training, and evaluation, at the link below:
Subscribe to my newsletter
Read articles from MANAS YADAV directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
