Implementing Convolutional Neural Network using PyTorch

Nitin SharmaNitin Sharma
14 min read

My Previous article “ convolutional neural network **“**explained about the architecture of CNN and how it works. Another article “Implementing CNN using TensorFlow” showed how to implement CNN using TensorFlow.

In this article, we will explore the process of creating and optimizing a simple Convolutional Neural Network (CNN) using PyTorch and Lightning. A CNN is a specialized type of neural network that excels in processing and classifying images.

We will begin by outlining the fundamental concepts of Convolutional Neural Networks, including their architecture and the role of convolutional layers, pooling layers, and activation functions. Following that, we will walk through the implementation steps in PyTorch, detailing how to set up the environment, load the data, and construct the network.

We will focus on building a CNN that can distinguish between images of Xs and Os. We will also cover the optimization techniques used to improve the model's accuracy and efficiency, making use of Lightning to streamline our training process.

an example of CNN is shown below comprising of Conv2D and MaxPool layers.

We will start with importing needed Libraries. First we install Lightening framework

%%capture

!pip install lightning
# torch will allow us to create tensors.
import torch 
# torch.nn allows us to create a neural network.
import torch.nn as nn 
# nn.functional give us access to the activation and loss functions.
import torch.nn.functional as F 
# optim contains many optimizers. This time we're using Adam

from torch.optim import Adam 
# lightning has tons of cool tools that make neural networks easier
import lightning as L 
# these are needed for the training data
from torch.utils.data import TensorDataset, DataLoader
## matplotlib allows us to draw the images used for input.
import matplotlib.pyplot as plt

Once we import the necessary Python modules, our next step will be to create images of the letters O and X. These images are essential for training and testing our neural network's performance. We need to design the images to closely resemble the examples provided below, ensuring they are clear and correctly formatted for optimal neural network processing. This will involve defining the size, resolution, and any specific features that make the letters recognizable. By preparing these images carefully, we can improve the accuracy and reliability of our model in recognizing and interpreting these characters.

We will begin the process by generating a visual representation of the letter "O." To do this, we will construct a 6x6 matrix of numbers. In this matrix, the number 0 will represent the color white, while the number 1 will represent the color black. Each element of the matrix will correspond to a pixel in the image, allowing us to form the distinctive shape of the letter "O" through the arrangement of these values.

## Create a 6x6 matrix of numbers where 0 represents white
## and 1 represents black.
o_image = [[0, 0, 1, 1, 0, 0],
           [0, 1, 0, 0, 1, 0],
           [1, 0, 0, 0, 0, 1],
           [1, 0, 0, 0, 0, 1],
           [0, 1, 0, 0, 1, 0],
           [0, 0, 1, 1, 0, 0]]
o_image # print out the matrix to verify that it is what we expect

We will create an image of the letter X by creating a similar 6x6 matrix, where the 1s are now in an X pattern.

x_image = [[1, 0, 0, 0, 0, 1],
           [0, 1, 0, 0, 1, 0],
           [0, 0, 1, 1, 0, 0],
           [0, 0, 1, 1, 0, 0],
           [0, 1, 0, 0, 1, 0],
           [1, 0, 0, 0, 0, 1]]
x_image

To visualize the o_image and x_image with matplotlib, we begin by using the subplots() function. This function generates a grid of subplots, returning an array named axarr[]. Each element in this array corresponds to a subplot defined by the parameters nrows (number of rows) and ncols (number of columns) we specify. By organizing the images into this grid, we can easily position and display each image within its respective subplot for clear and effective comparison.

## To draw the o_image and x_image, we first call subplots(), which creates 
## an array, called axarr[], with an entry for each element in a grid
## specified by nrows and ncols.
fig, axarr = plt.subplots(nrows=1, ncols=2, figsize=(5, 5))

## Now we pass o_image and x_image to .imshow() for each element
## in the grid created by plt.subplots()
axarr[0].imshow(o_image, cmap='gray_r') ## Setting cmap='gray_r' gives us reverse grayscale.
axarr[1].imshow(x_image, cmap='gray_r')

We will begin by loading the training data into a DataLoader, a powerful tool in PyTorch that streamlines the process of feeding data into our neural network for training. DataLoaders are particularly advantageous when working with large datasets for several reasons. First, they enable us to access our data in manageable batches, which helps reduce memory consumption and speeds up the training process. Second, DataLoaders provide an easy way to shuffle our dataset at the beginning of each epoch, ensuring that the model does not learn any unintended patterns from the order of the data. Finally, if we want to quickly test our code or validate our model's functionality without using the entire dataset, DataLoaders allow us to work with a smaller subset of the data.

In order to prepare our training data for the DataLoader, we will convert the images into tensors using the torch.tensor() function. This step is crucial because PyTorch requires inputs to be in tensor format for processing. Once converted, we will save these tensors as input_images, which will then be passed to the DataLoader for efficient batch processing during training. This systematic approach will facilitate a smoother training experience and help us achieve better results with our neural network.

## Convert the images into tensors...
input_images = torch.tensor([o_image, x_image]).type(torch.float32)

In this step, we will create tensors that represent the labels, which are the ideal output values corresponding to each input image in our dataset. Specifically, our convolutional neural network is designed to recognize two distinct letters: O and X.

To achieve this, we will define our output for the letter O as the tensor [1.0, 0.0], indicating that the first output neuron is activated for the letter O while the second one is not. Conversely, the tensor [0.0, 1.0] will be used to represent the ideal output for the letter X, where the second output neuron is activated.

These tensors will be crucial for training the neural network, as they will guide the model in learning to differentiate between the two letters based on the input images. All the generated labels for our training dataset will be saved in a variable named input_labels, which will facilitate easy access and manipulation during the training process.

## Create the labels for the input images
input_labels = torch.tensor([[1.0, 0.0], [0.0, 1.0]]).type(torch.float32)

We will combine the input images with the input labels to create a TensorDataset, which we will then use to create a DataLoader.

## Now combine input_images and input_labels into a TensorDataset...
dataset = TensorDataset(input_images, input_labels) 
## ...and use the TensorDataset to create a DataLoader.
dataloader = DataLoader(dataset)

Build a convolutional neural network with PyTorch and Lightning

To build a convolutional neural network (CNN) using PyTorch, we will need to define a new class that extends the capabilities of LightningModule. This approach simplifies the training process and enhances model organization. The new class will encompass several key methods, each serving a specific purpose in the model's functionality:

  • init(): This method is crucial for initializing the CNN's parameters. Inside this method, you will set up the weights and biases for the network layers. Additionally, you’ll maintain any necessary bookkeeping information, such as the architecture details of the network and configurations for training.

  • forward(): In this method, you will define how data flows through the network during a forward pass. This includes the series of operations performed on the input data as it travels through each layer of the CNN, such as convolutional layers, activation functions, and pooling layers.

  • configure_optimizers(): This method is used to set up the optimization algorithm that will update the model's weights during training. In this tutorial, we will be using the Adam optimizer, which is well-regarded for its efficiency and effectiveness in optimizing deep learning models.

  • training_step(): This method handles the training process for each batch of data. It takes the training data as input and feeds it into the forward() method to obtain predictions. Afterward, it calculates the loss by comparing the predicted values with the actual target values. Additionally, it keeps track of the loss values, allowing for logging and monitoring during training, which is essential for assessing model performance.

By implementing these methods, we will have a well-structured and functional convolutional neural network ready for training with PyTorch.

Steps to build CNN using PyTorch

Let's build a simple Convolutional Neural Network (CNN) using the LightningModule. This network will help us extract features from images and classify them accordingly.

Step 1: Initializing Weights and Biases

We begin by initializing the weights and biases for our CNN. This step is crucial, as these parameters will be adjusted during training to improve the model's performance.

Step 2: Setting Up the Convolutional Layer

The first layer of our CNN is the convolutional layer, which we set up using nn.Conv2d(). This layer applies a filter to our input data to extract features. The parameters needed to configure this layer include:

  • in_channels: This parameter specifies the number of input channels. For instance, a grayscale (black and white) image has one channel, while a color image typically has three (for red, green, and blue).

  • out_channels: This parameter determines how many output channels the convolutional layer will produce. If the model receives multiple input channels, we can combine them into fewer output channels, or we can increase the number of output channels to capture more features.

  • kernel_size: This refers to the dimensions of the filter (also known as the convolutional kernel). In our implementation, we will use a 3x3 filter, but we have the flexibility to choose other sizes, including rectangular shapes, depending on our specific needs.

Step 3: Implementing Max Pooling

After the convolutional layer, we apply a max pooling operation using nn.MaxPool2d(). This step reduces the dimensionality of the feature maps, helping to extract the most important features and reduce computational load. The parameters for the max pooling layer include:

  • kernel_size: This defines the size of the pooling filter. In our case, we are using a 2x2 filter, which will help summarize the features in each 2x2 section of the input.

  • stride: The stride determines how far we move the pooling filter with each operation. In our example, we set the stride to 2, meaning that after applying the filter to one section, it will move 2 units over (or down), ensuring there is no overlap between pooling sections.

Step 4: Constructing the Fully Connected Neural Network

Now, we move on to constructing a fully connected neural network (also known as a dense layer). This network will take in the features extracted from the convolutional and pooling layers. The configuration of this layer includes:

  • Input features (in_features=4): This specifies the number of features that will be input into the neural network.

  • Output features (out_features=1): This indicates that we are producing a single output from the neural network from ReLU activation function, which could represent the predicted classification for our input.

Additionally, we will implement a hidden layer that has:

  • Input features (in_features=1): Here, the output from the previous layer feeds into this hidden layer.

  • Output features (out_features=2): This layer will produce two outputs, allowing the network to classify the input into two different categories.

Step 5: Calculating Loss with Cross Entropy

To assess how well our neural network is performing, we will use Cross Entropy Loss. This loss function compares the network's predicted classifications against the actual species labels in our dataset. The implementation of this is done using nn.CrossEntropyLoss, which conveniently applies a SoftMax function to the output values. This means we don't need to apply the SoftMax ourselves during training. However, we must remember to apply it during inference after the model has been trained.

Step 6: Applying the Filter and Activation Functions

We start the forward pass of our CNN by applying the filter to the input image. After this, the output from the convolution is passed through a ReLU activation function, which introduces non-linearity into the model:

Next, we take the output from the ReLU layer and feed it into the max pooling layer:

At this stage, we have a reduced matrix of feature values. To prepare this for input into our fully connected neural network, we flatten the matrix into a vector format:

Step 7: Running the Flattened Values Through the Neural Network

Once the values are flattened, we can pass them through our fully connected layer, which includes the hidden layer along with the activation function, to obtain the final output for classification.

Step 8: Configuring the Optimizer

Finally, we need to set up the optimizer that will adjust our model's parameters. We pass the parameters we want to optimize, which can be accessed using self.parameters(), into the optimizer. For this implementation, we’ll use the Adam optimizer, setting a learning rate (lr) of 0.001

We have now established a functioning CNN capable of processing images and making predictions. With proper training and validation, this model will learn to classify images effectively based on the features extracted from the data.

## Now build a simple CNN...
class SimpleCNN(L.LightningModule):

    def __init__(self):

        super().__init__() 
        L.seed_everything(seed=42)

        self.conv = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3)

        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)

        self.input_to_hidden = nn.Linear(in_features=4, out_features=1)
        ## ..and the single hidden layer, in_features=1, goes to
        ## two outputs, out_features=2
        self.hidden_to_output = nn.Linear(in_features=1, out_features=2)

        self.loss = nn.CrossEntropyLoss()


    def forward(self, x):

        ## First we apply a filter to the input image
        x = self.conv(x)

        ## Then we run the output from the filter through a ReLU...
        x = F.relu(x)
        ## Then we run the output from the ReLU through a Max Pooling layer...
        x = self.pool(x)
        x = torch.flatten(x, 1) # flatten all dimensions except batch 
        x = self.input_to_hidden(x)
        x = F.relu(x)
        x = self.hidden_to_output(x)

        return x


    def configure_optimizers(self):

        return Adam(self.parameters(), lr=0.001)

    def training_step(self, batch, batch_idx):

        inputs, labels = batch 

        outputs = self.forward(inputs)

        ## Then we calculate the loss.
        loss = self.loss(outputs, labels)


        return loss

Training our Neural Network

To train our new convolutional neural network, we are creating a model based on the new class, SimpleCNN. Create a Lightning Trainer using the function L.Trainer(), and utilize it to optimize the parameters. Please note that we will begin with 100 epochs, which means we will complete 100 full passes through our training data. This may be sufficient to successfully optimize all of the parameters, but there is a possibility it might not be enough.

model = SimpleCNN()
trainer = L.Trainer(max_epochs=700)
trainer.fit(model, train_dataloaders=dataloader)
INFO: 💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO:lightning.pytorch.utilities.rank_zero:💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO: GPU available: False, used: False
INFO:lightning.pytorch.utilities.rank_zero:GPU available: False, used: False
INFO: TPU available: False, using: 0 TPU cores
INFO:lightning.pytorch.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO: HPU available: False, using: 0 HPUs
INFO:lightning.pytorch.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO: 
  | Name             | Type             | Params | Mode 
--------------------------------------------------------------
0 | conv             | Conv2d           | 10     | train
1 | pool             | MaxPool2d        | 0      | train
2 | input_to_hidden  | Linear           | 5      | train
3 | hidden_to_output | Linear           | 4      | train
4 | loss             | CrossEntropyLoss | 0      | train
--------------------------------------------------------------
19        Trainable params
0         Non-trainable params
19        Total params
0.000     Total estimated model params size (MB)
5         Modules in train mode
0         Modules in eval mode
INFO:lightning.pytorch.callbacks.model_summary:
  | Name             | Type             | Params | Mode 
--------------------------------------------------------------
0 | conv             | Conv2d           | 10     | train
1 | pool             | MaxPool2d        | 0      | train
2 | input_to_hidden  | Linear           | 5      | train
3 | hidden_to_output | Linear           | 4      | train
4 | loss             | CrossEntropyLoss | 0      | train
--------------------------------------------------------------
19        Trainable params
0         Non-trainable params
19        Total params
0.000     Total estimated model params size (MB)
5         Modules in train mode
0         Modules in eval mode

Epoch 699: 100%

 2/2 [00:00<00:00, 82.10it/s, v_num=2]

INFO: `Trainer.fit` stopped: `max_epochs=700` reached.
INFO:lightning.pytorch.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=700` reached.

Having completed the training of our model, we are now positioned to utilize it for making predictions using new data. In particular, we will evaluate the efficacy of our model in predicting an image of the letter "X" that has been shifted one pixel to the right. To initiate this process, we will first generate an image of the letter "X" that is displaced by one pixel.

shifted_x_image = [[0, 1, 0, 0, 0, 0],
                   [0, 0, 1, 0, 0, 1],
                   [0, 0, 0, 1, 1, 0],
                   [0, 0, 0, 1, 1, 0],
                   [0, 0, 1, 0, 0, 1],
                   [0, 1, 0, 0, 0, 0]]
shifted_x_image

Lets check the image by drawing with matplotlib

fig, ax = plt.subplots(figsize=(2.5, 2.5))
ax.imshow(shifted_x_image, cmap='gray_r') ## Setting cmap='gray_r' gives us reverse grayscale.

Let's see if our trained convolutional neural network can accurately classify it as an X.

## First, let's make a prediction with the new image...
prediction = model(torch.tensor([shifted_x_image]).type(torch.float32))

## Now make the prediction easy to read and interpret by
## running it through torch.softmax() and torch.round()
predicted_label = torch.round(torch.softmax(prediction, dim=1), decimals=2) ## dim=0 applies argmax to rows, dim=1 applies argmax to colum

predicted_label
tensor([[0.0200, 0.9800]], grad_fn=<RoundBackward1>)

We see that the trained network correctly predicted X, as the second output value, representing X, is larger than the first output value, representing O.

0
Subscribe to my newsletter

Read articles from Nitin Sharma directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Nitin Sharma
Nitin Sharma