Neural Network from Scratch in Python: A Step-by-Step Guide

Neural networks are the cornerstone of modern machine learning, powering applications like image recognition, natural language processing, and even autonomous vehicles. While frameworks like TensorFlow and PyTorch abstract away much of the complexity, building a neural network from scratch gives you invaluable insight into how these models work at a fundamental level.
In this post, we'll build a simple feedforward neural network from scratch using Python. We'll use the MNIST dataset of handwritten digits (0–9) to train our model. By the end of this guide, you’ll not only understand how neural networks function but also how to implement one from the ground up.
What is a Neural Network?
A neural network is a computational model inspired by the human brain, consisting of layers of interconnected nodes (neurons) that learn to recognize patterns in data. Neural networks are powerful tools for classification, regression, and feature extraction, making them ideal for tasks like image recognition, speech processing, and more.
Key Components of a Neural Network
Input Layer: The layer that receives input data, such as an image or a feature vector.
Hidden Layers: These layers perform computations on the data using weights and biases. They enable the network to learn complex patterns.
Output Layer: The layer that produces the model's predictions. For classification tasks, it outputs the probabilities for each class.
Activation Functions: Functions applied to neurons to introduce non-linearity into the network, enabling it to model complex relationships. Common activation functions include ReLU, Sigmoid, and Softmax.
Weights and Biases: Parameters that the network learns during training to minimize error. Weights connect the neurons between layers, and biases are added to the neurons to shift their outputs.
Step 1: Loading and Preprocessing Data
Before training a neural network, it’s crucial to preprocess the data to ensure it is in a suitable format for the network.
Why Preprocessing is Important
Raw data often requires cleaning and transformation. In the case of image data, such as the MNIST dataset, preprocessing typically includes:
Normalization: Scaling pixel values to a smaller range (0 to 1) to help the network converge faster during training.
Shuffling: Randomly rearranging the data to ensure that the model is not biased toward any particular pattern in the data.
Splitting: Dividing the data into a training set (for learning) and a validation set (for evaluating model performance).
Loading and Shuffling the Data
The MNIST dataset is stored in a CSV format, where each row represents a flattened 28x28 grayscale image (784 pixels) and a label. We'll load and shuffle the data to ensure randomness during training.
import numpy as np
import pandas as pd
def load_and_shuffle_data(filepath):
df = pd.read_csv(filepath) # Load CSV into a DataFrame
data_array = df.values # Convert DataFrame to NumPy array
np.random.shuffle(data_array) # Shuffle the rows
return data_array
Splitting and Normalizing Data
We will split the dataset into a training set and a validation set. The pixel values are normalized by dividing them by 255 (since pixel values range from 0 to 255) to scale them between 0 and 1.
def split_and_normalize_data(data, dev_size=1000):
total_samples, total_features = data.shape
# Validation Data
validation_data = data[:dev_size].T # Transpose for easier manipulation
val_labels = validation_data[0] # Labels are in the first row
val_features = validation_data[1:] / 255.0 # Normalize pixel values
# Training Data
train_data = data[dev_size:].T # Remaining samples for training
train_labels = train_data[0]
train_features = train_data[1:] / 255.0
return val_features, val_labels, train_features, train_labels
Step 2: Initializing Neural Network Parameters
Network Architecture
We'll create a simple feedforward neural network with:
Input Layer: 784 neurons (since MNIST images are 28x28 pixels, resulting in 784 features).
Hidden Layer: 10 neurons (chosen arbitrarily for simplicity).
Output Layer: 10 neurons (one for each digit class: 0–9).
The weights and biases connecting these layers are initialized randomly. These parameters will be learned during the training process.
def initialize_parameters(input_dim=784, hidden_units=10, output_units=10):
weights1 = np.random.uniform(-0.5, 0.5, (hidden_units, input_dim)) # Weights for input-to-hidden layer
bias1 = np.random.uniform(-0.5, 0.5, (hidden_units, 1)) # Biases for hidden layer
weights2 = np.random.uniform(-0.5, 0.5, (output_units, hidden_units)) # Weights for hidden-to-output layer
bias2 = np.random.uniform(-0.5, 0.5, (output_units, 1)) # Biases for output layer
return weights1, bias1, weights2, bias2
Step 3: Forward Propagation
Forward propagation is the process of passing input data through the network to compute predictions. It involves matrix multiplication between the data and the weights, followed by activation functions to introduce non-linearity.
Activation Functions
ReLU (Rectified Linear Unit): Used for the hidden layers. ReLU sets all negative values to zero, allowing the model to learn non-linear patterns.
Softmax: Used for the output layer to convert the network’s raw scores into probabilities. The softmax function ensures that the predicted values sum to 1, which makes them interpretable as probabilities.
def relu_activation(z):
return np.maximum(z, 0)
def softmax_activation(z):
exp_z = np.exp(z - np.max(z, axis=0, keepdims=True)) # Stability trick for large values
return exp_z / np.sum(exp_z, axis=0, keepdims=True)
Forward Pass
In forward propagation, we calculate the pre-activations (the weighted sums of inputs) and then apply activation functions.
def forward_pass(weights1, bias1, weights2, bias2, features):
preactivation1 = np.dot(weights1, features) + bias1 # Input-to-hidden layer computation
activation1 = relu_activation(preactivation1) # Apply ReLU activation
preactivation2 = np.dot(weights2, activation1) + bias2 # Hidden-to-output layer computation
activation2 = softmax_activation(preactivation2) # Apply Softmax activation
return preactivation1, activation1, preactivation2, activation2
Step 4: Backward Propagation
Backward propagation is the process of calculating gradients of the loss function with respect to the model parameters (weights and biases). These gradients will be used to update the parameters and minimize the loss.
Loss Function: Cross-Entropy Loss
For classification tasks, we use the cross-entropy loss, which measures the difference between the true label and the predicted probabilities.
def compute_gradients(preactivation1, activation1, preactivation2, activation2, weights1, weights2, features, labels, num_samples):
encoded_labels = encode_labels(labels) # One-hot encode labels
delta2 = activation2 - encoded_labels # Output layer error term
grad_weights2 = np.dot(delta2, activation1.T) / num_samples # Gradients w.r.t weights of output layer
grad_bias2 = np.sum(delta2, axis=1, keepdims=True) / num_samples
delta1 = np.dot(weights2.T, delta2) * relu_derivative(preactivation1) # Hidden layer error term
grad_weights1 = np.dot(delta1, features.T) / num_samples # Gradients w.r.t weights of hidden layer
grad_bias1 = np.sum(delta1, axis=1, keepdims=True) / num_samples
return grad_weights1, grad_bias1, grad_weights2, grad_bias2
Step 5: Training the Neural Network
Training a neural network involves iterating over the data, performing forward propagation, calculating the gradients using backward propagation, and updating the parameters using an optimization algorithm (e.g., gradient descent).
def update_parameters(weights1, bias1, weights2, bias2, grad_weights1, grad_bias1, grad_weights2, grad_bias2, learning_rate):
weights1 -= learning_rate * grad_weights1 # Update weights for hidden layer
bias1 -= learning_rate * grad_bias1 # Update biases for hidden layer
weights2 -= learning_rate * grad_weights2 # Update weights for output layer
bias2 -= learning_rate * grad_bias2 # Update biases for output layer
return weights1, bias1, weights2, bias2
Training Loop
In each iteration, we perform forward propagation, compute the gradients, and update the parameters.
def train_neural_network(features, labels, learning_rate=0.10, num_iterations=500):
num_samples = features.shape[1]
weights1, bias1, weights2, bias2 = initialize_parameters() # Initialize parameters
for iteration in range(num_iterations):
preactivation1, activation1, preactivation2, activation2 = forward_pass(weights1, bias1, weights2, bias2, features)
grad_weights1, grad_bias1, grad_weights2, grad_bias2 = compute_gradients(preactivation1, activation1, preactivation2, activation2, weights1, weights2, features, labels, num_samples)
weights1, bias1, weights2, bias2 = update_parameters(weights1, bias1, weights2, bias2, grad_weights1, grad_bias1, grad_weights2, grad_bias2, learning_rate)
if iteration % 10 == 0:
predictions = predict_labels(activation2)
accuracy = calculate_accuracy(predictions, labels)
print(f"Iteration {iteration}: Accuracy = {accuracy:.4f}")
return weights1, bias1, weights2, bias2
Conclusion
By following these steps, we’ve built a neural network from scratch that can classify handwritten digits from the MNIST dataset. Understanding the inner workings of a neural network is crucial for gaining insights into how machine learning models learn patterns from data.
While building neural networks from scratch is a great way to learn, in practice, you’ll likely use higher-level libraries like TensorFlow or PyTorch for efficiency. However, understanding these fundamentals will help you become a better practitioner of machine learning and give you the ability to troubleshoot and optimize your models at a deeper level.
Happy coding!
Check out the GitHub repo to find out more.
Subscribe to my newsletter
Read articles from Dawood Khan directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
