Log #4: CNN Visualizer – Part 4

John BoamahJohn Boamah
4 min read

Learning Objectives

By the end of Log #4, you will be able to:

  • Understand the fundamental problem that ResNet architectures solve

  • Explain the concepts of residual and skip connections

  • Analyze the architectural differences between ResNet18 and ResNet50

  • Interpret ResNet architecture diagrams and layer compositions

  • Decide when to use ResNet18 versus ResNet50 for different projects

Introduction

In this log, we were supposed to build the image upload endpoint, prediction endpoints, and integrate proper error handling and validation. But before diving into implementation, we need to understand the models we are working with: ResNet18 and ResNet50.

Deep neural networks (DNNs) have revolutionized computer vision, but training very deep networks introduces major challenges. One of these challenges was addressed by Residual Networks (ResNets), introduced by Kaiming He and colleagues in 2015. Before ResNets, deeper networks often performed worse than shallow ones due to the vanishing gradient problem and degradation issues. ResNet introduced the groundbreaking idea of residual learning. Instead of learning the desired underlying mapping H(x) directly, ResNet learns a residual function: F(x) = H(x) - x. This makes optimization easier and allows training of networks with hundreds or even thousands of layers.

The Residual Learning Concept

The core innovation of ResNet lies in its skip (or shortcut) connections. These allow the input to bypass one or more layers and are added directly to their output. Figure 1 shows a simple diagram of a ResNet

Figure 1

Figure 1: Simple Residual Network

Mathematically:

  • Traditional network: H(x) = F(x)

  • ResNet block: H(x) = F(x) + x

Why skip connections matter

  1. Gradient flow: Gradients can flow directly through skip connections, mitigating the vanishing gradient problem

  2. Identity mapping: If the optimal function is close to identity, it’s easier to learn (F(x) = 0) than (H(x) = x)

  3. Deep training: Enables effective training of much deeper networks

ResNet Building Blocks

ResNet uses two main types of residual blocks:

  1. Basic Block (used in ResNet18)

    Figure 2: Architecture of a basic block

  1. Bottleneck Block (used in ResNet50)

Figure 3: Architecture of a bottleneck block

ResNet18 Architecture

ResNet18 is a relatively lightweight variant with 18 layers, making it suitable for scenarios with limited computational resources while still providing excellent performance.

ResNet18 Specifications

  • Total Layers: 18 (16 convolutional + 1 initial conv + 1 FC)

  • Parameters: ~11.7 million

  • Block Type: Basic blocks

  • Blocks per Stage: [2, 2, 2, 2]

  • Feature Maps: [64, 128, 256, 512]

ResNet50 Architecture

ResNet50 is deeper and more powerful, using bottleneck blocks to balance computational cost with model capacity.

ResNet50 Specifications:

  • Total Layers: 50 (49 convolutional + 1 FC)

  • Parameters: ~25.6 million

  • Block Type: Bottleneck blocks

  • Blocks per Stage: [3, 4, 6, 3]

  • Feature Maps: [256, 512, 1024, 2048]

Key Architectural Differences

AspectResNet18ResNet50
Depth18 layers50 layers
Block TypeBasic BlockBottleneck Block
Parameters~11.7M~25.6M
Computational CostLowerHigher
AccuracyGoodBetter
Training TimeFasterSlower
Memory UsageLowerHigher

Skip Connection Variants

ResNet handles dimension mismatches in skip connections using two approaches:

  1. Identity mapping: When input and output dimensions match

  2. Projection shortcuts: When dimensions change (using 1×1 convolutions)

```python
# Pseudocode for skip connections
if input.shape == output.shape:
    return output + input  # Identity mapping
else:
    return output + conv1x1(input)  # Projection shortcut
```

Performance Characteristics

ResNet18 Use Cases:

  • Resource-constrained environments

  • Real-time applications

  • When training time is critical

  • Baseline experiments

ResNet50 Use Cases:

  • High-accuracy requirements

  • Transfer learning backbone

  • Feature extraction for complex tasks

  • When computational resources are available

  • Production systems prioritizing accuracy

Training Considerations

  • Batch Normalization: Critical for training deep ResNets effectively

  • Learning Rate Scheduling: Often use step decay or cosine annealing

  • Weight Initialization: Kaiming initialization works well

Next steps (Log#5)

In log#5, we will focus on creating the backend API endpoints for our CNN Visualizer, including:

  • Building an image upload endpoint to accept user images.

  • Adding an inference endpoint that runs predictions and returns activations.

  • Structuring JSON responses to be used on the frontend.

References

  • Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. The MIT Press, Cambridge, Massachusetts.

  • Simon J.D. P. (2024). Understanding deep learning.

0
Subscribe to my newsletter

Read articles from John Boamah directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

John Boamah
John Boamah