Learning Objectives

By the end of Log #3, you should be able to:

Load pretrained CNN models like ResNet18 and ResNet50 using PyTorch’s torchvision.models.
Automatically use the official preprocessing transforms from the selected model’s weights.
Extract intermediate feature maps from convolutional layers using PyTorch hooks.
Retrieve CNN architecture details such as input channels, output channels, and kernel sizes.
Run inference and get top-5 predictions from the model.

Introduction

In Log #2, we set up the FastAPI backend for our CNN Visualizer, created health check endpoints, and prepared the project for the next phase. Now, in Log #3, we take the first real step toward model-based visualization. We will load a pretrained CNN, feed it images, and retrieve output predictions, activation statistics, and layer metadata.

To achieve this, let’s first create our model.py file:

backend/app/model.py

cnn-visualizer/
│
├── frontend/               
│
└── backend/
    ├── venv/               
    └── app/
        ├── main.py             
        └── model.py

This file will contain the CNNVisualizer class, which is responsible for:

Loading the model.
Preparing the image preprocessing pipeline.
Capturing and processing activations from convolutional layers.

Step 1: Imports and Dependencies

import torch
import torch.nn as nn
import torchvision.models as models
import torchvision.transforms as transforms
from torchvision.models import ResNet18_Weights, ResNet50_Weights
from typing import Dict, List, Tuple
import numpy as np

torch : core PyTorch library for building and training deep learning models.
torch.nn: provides neural network layers (e.g., nn.Conv2d) and loss functions.
torchvision.models : gives access to pretrained models like ResNet and more.
torchvision.transforms : handles image preprocessing tasks such as resizing, cropping, and normalization.
ResNet18_Weights, ResNet50_Weights : predefined weight configurations for loading ResNet models with pretrained ImageNet parameters.
typing (Dict, List, Tuple): adds type hints for better code clarity and maintainability.
numpy : used for numerical operations and converting PyTorch tensors to NumPy arrays for statistical analysis.

Step 2: The CNNVisualizer Class

class CNNVisualizer:
    def __init__(self, model_name: str = "resnet18"):
        self.device = torch.device(
            "cuda" if torch.cuda.is_available() else "cpu")
        self.model_name = model_name
        self.model = "None"
        self.hooks = []
        self.activations = {}

        self._load_model()

        if self.model_name == "resnet18":
            self.preprocess = ResNet18_Weights.DEFAULT.transforms()
        elif self.model_name == "resnet50":
            self.preprocess = ResNet50_Weights.DEFAULT.transforms()

model_name: specifies which pretrained architecture to load (supports "resnet18" or "resnet50").
device : automatically selects GPU ("cuda") if available; otherwise defaults to CPU.
model : placeholder that will hold the loaded pretrained model.
hooks : list to store registered PyTorch hooks for capturing intermediate layer outputs.
activations : dictionary for storing feature maps (layer outputs) during forward passes.
_load_model() : called at initialization to load the specified pretrained model with weights.
self.preprocess : sets the appropriate preprocessing pipeline (resize, crop, normalize) for the chosen model’s expected input format.

Step 3: Loading the Model

def _load_model(self):
    if self.model_name == "resnet18":
        weights = ResNet18_Weights.DEFAULT
        self.model = models.resnet18(weights=weights)
    elif self.model_name == "resnet50":
        weights = ResNet50_Weights.DEFAULT
        self.model = models.resnet50(weights=weights)
    else:
        raise ValueError(f"Model {self.model_name} not supported")

    self.model.to(self.device)
    self.model.eval()

    print(f"Loaded {self.model_name} model on {self.device}")

Loads a pretrained model: uses torchvision.models to load ResNet18 or ResNet50 with default ImageNet weights.
Device transfer: moves the model to self.device so it runs on GPU if available, otherwise CPU.
Evaluation mode: calls .eval() to turn off dropout layers and stop batch normalization from updating, ensuring consistent inference results.
Feedback: prints a confirmation showing which model was loaded and where it’s running.

Step 4: Capturing Intermediate Activations

def _register_hooks(self):
    def get_activation(name):
        def hook(model, input, output):
            self.activations[name] = output.detach()
        return hook

_register_hooks uses the helper Function: get_activation to:

Define a nested function get_activation(name) that creates a hook function for a given layer name.
The returned hook function captures the output of a layer during the forward pass.
Inside hook, the output tensor is detached from the computation graph to prevent gradients from being tracked and is stored in the self.activations dictionary with the layer’s name as the key.

for name, module in self.model.named_modules():
    if isinstance(module, nn.Conv2d):
        layer_name = f"conv_{layer_count}_{name}"
        hook = module.register_forward_hook(get_activation(layer_name))
        self.hooks.append(hook)
        layer_count += 1

    print(f"Registered hooks for {layer_count} convolutional layers")

The loop iterates over all named modules (layers) in the model using named_modules().
For each module, it checks if the layer is a convolutional layer (nn.Conv2d).
For every convolutional layer found:
- A unique layer_name is created using a counter and the layer’s original name.
- A forward hook is registered on the layer using register_forward_hook(), which calls get_activation(layer_name) to capture that layer’s output during the forward pass.
- The hook object is stored in self.hooks to manage and remove later.
- The convolutional layer count increments to keep track of how many hooks are registered.
This allows capturing intermediate feature maps from all convolutional layers as the model processes an input.

 def _clear_hooks(self):
       for hook in self.hooks:
           hook.remove()
       self.hooks = []
       self.activations = {}

_clear_hooks : removes all registered forward hooks from the model to stop collecting activations, then resets both the hooks list and the stored activations dictionary

Final code for step 4:

    def _register_hooks(self):
        """For intermediate activations"""

        def get_activation(name):
            def hook(model, input, output):
                self.activations[name] = output.detach()
            return hook

        self._clear_hooks()

        layer_count = 0
        for name, module in self.model.named_modules():
            if isinstance(module, nn.Conv2d):
                layer_name = f"conv_{layer_count}_{name}"
                hook = module.register_forward_hook(get_activation(layer_name))
                self.hooks.append(hook)
                layer_count += 1

        print(f"Registered hooks for {layer_count} convolutional layers")

    def _clear_hooks(self):
        for hook in self.hooks:
            hook.remove()
        self.hooks = []
        self.activations = {}

Step 5: Get Layers Information

    def get_layer_info(self) -> List[Dict]:
        layers_info = []
        layer_count = 0

        for name, module in self.model.named_modules():
            if isinstance(module, nn.Conv2d):
                layers_info.append({
                    "layer_id": layer_count,
                    "name": name,
                    "full_name": f"conv_{layer_count}_{name}",
                    "in_channels": module.in_channels,
                    "out_channels": module.out_channels,
                    "kernel_size": module.kernel_size,
                    "stride": module.stride,
                    "padding": module.padding
                })
                layer_count += 1

        return layers_info

get_layer_info : extracts metadata about all convolutional layers in the model:

Iterates over all modules in the model.
Selects only layers of type nn.Conv2d.
For each conv layer, collects details like:
- Layer ID (index in conv layer sequence)
- Layer name and a full unique name with index
- Number of input/output channels
- Kernel size, stride, and padding values
Finally returns a list of dictionaries containing this info, useful for understanding model architecture and visualizing activations by layer.

Step 6: Running Predictins and Storing Results

    def predict_and_explain(self, image_tensor: torch.Tensor) -> Dict:
        """Run inference and capture all intermediate activations"""

        self._register_hooks()
        image_tensor = image_tensor.to(self.device)

        if image_tensor.dim() == 3:
            image_tensor = image_tensor.unsqueeze(0)

        with torch.no_grad():
            outputs = self.model(image_tensor)

        probabilities = torch.nn.functional.softmax(outputs[0], dim=0)
        top5_prob, top5_classes = torch.topk(probabilities, 5)

        results = {
            "prediction": {
                "top5_classes": top5_classes.cpu().numpy().tolist(),
                "top5_probabilities": top5_prob.cpu().numpy().tolist(),
                "raw_output": outputs.cpu().numpy().tolist()
            },
            "activations": self._process_activations(),
            "layer_info": self.get_layer_info()
        }

        self._clear_hooks()

        return results

predict_and_explain : registers hooks on all convolutional layers to capture intermediate outputs during the forward pass.

Moves the input image_tensor to the device (GPU or CPU).
Adds a batch dimension if the input tensor is a single image (3D → 4D tensor).
Runs the model in inference mode (torch.no_grad()) to prevent gradient calculation and save memory.
Applies softmax to model outputs to get prediction probabilities.
Then retrieves the top-5 predicted classes and their probabilities.
Gathers results dictionary including:
- Top-5 class IDs and probabilities (converted to lists for easy JSON serialization).
- Raw model outputs (logits).
- Processed activations from each hooked convolutional layer.
- Metadata about convolutional layers.
Clears all hooks to free resources and reset activations.
Returns the comprehensive results dictionary for downstream use

Step 7: Processing Activation Maps

  def _process_activations(self) -> Dict:
        processed_activations = {}

        for layer_name, activation in self.activations.items():
            activation_np = activation.cpu().numpy()

            if activation_np.shape[0] == 1:
                activation_np = activation_np[0]  # [channels, height, width]

            processed_activations[layer_name] = {
                "shape": list(activation_np.shape),
                "num_channels": activation_np.shape[0],
                "spatial_size": [activation_np.shape[1], activation_np.shape[2]],
                "mean_activation": float(np.mean(activation_np)),
                "std_activation": float(np.std(activation_np)),
                "max_activation": float(np.max(activation_np)),
                "min_activation": float(np.min(activation_np)),
                "data": activation_np.tolist()
            }

        return processed_activations

_process_activations : converts each stored activation tensor from PyTorch to a NumPy array for easier analysis.

If the batch size dimension is 1, it removes that extra dimension to get [channels, height, width].
For each layer’s activation, computes and stores:
- The shape of the activation map.
- Number of channels (feature maps).
- Spatial size (height and width).
- Basic statistics: mean, standard deviation, max, and min values.
- Full activation data converted to a Python list for serialization or visualization.
Returns a dictionary mapping layer names to their processed activation details.

Combined code from step 1 to 7:

import torch
import torch.nn as nn
import torchvision.models as models
import torchvision.transforms as transforms
from torchvision.models import ResNet18_Weights, ResNet50_Weights
from typing import Dict, List, Tuple
import numpy as np


class CNNVisualizer:
    def __init__(self, model_name: str = "resnet18"):
        self.device = torch.device(
            "cuda" if torch.cuda.is_available() else "cpu")
        self.model_name = model_name
        self.model = "None"
        self.hooks = []
        self.activations = {}

        self._load_model()

        if self.model_name == "resnet18":
            self.preprocess = ResNet18_Weights.DEFAULT.transforms()
        elif self.model_name == "resnet50":
            self.preprocess = ResNet50_Weights.DEFAULT.transforms()


    def _load_model(self):
        if self.model_name == "resnet18":
            weights = ResNet18_Weights.DEFAULT
            self.model = models.resnet18(weights=weights)
        elif self.model_name == "resnet50":
            weights = ResNet50_Weights.DEFAULT
            self.model = models.resnet50(weights=weights)
        else:
            raise ValueError(f"Model {self.model_name} not supported")

        self.model.to(self.device)
        self.model.eval()

        print(f"Loaded {self.model_name} model on {self.device}")

    def _register_hooks(self):
        """For intermediate activations"""

        def get_activation(name):
            def hook(model, input, output):
                self.activations[name] = output.detach()
            return hook

        self._clear_hooks()

        layer_count = 0
        for name, module in self.model.named_modules():
            if isinstance(module, nn.Conv2d):
                layer_name = f"conv_{layer_count}_{name}"
                hook = module.register_forward_hook(get_activation(layer_name))
                self.hooks.append(hook)
                layer_count += 1

        print(f"Registered hooks for {layer_count} convolutional layers")

    def _clear_hooks(self):
        for hook in self.hooks:
            hook.remove()
        self.hooks = []
        self.activations = {}

    def get_layer_info(self) -> List[Dict]:
        layers_info = []
        layer_count = 0

        for name, module in self.model.named_modules():
            if isinstance(module, nn.Conv2d):
                layers_info.append({
                    "layer_id": layer_count,
                    "name": name,
                    "full_name": f"conv_{layer_count}_{name}",
                    "in_channels": module.in_channels,
                    "out_channels": module.out_channels,
                    "kernel_size": module.kernel_size,
                    "stride": module.stride,
                    "padding": module.padding
                })
                layer_count += 1

        return layers_info

    def predict_and_explain(self, image_tensor: torch.Tensor) -> Dict:
        """Run inference and capture all intermediate activations"""

        self._register_hooks()
        image_tensor = image_tensor.to(self.device)

        if image_tensor.dim() == 3:
            image_tensor = image_tensor.unsqueeze(0)

        with torch.no_grad():
            outputs = self.model(image_tensor)

        probabilities = torch.nn.functional.softmax(outputs[0], dim=0)
        top5_prob, top5_classes = torch.topk(probabilities, 5)

        results = {
            "prediction": {
                "top5_classes": top5_classes.cpu().numpy().tolist(),
                "top5_probabilities": top5_prob.cpu().numpy().tolist(),
                "raw_output": outputs.cpu().numpy().tolist()
            },
            "activations": self._process_activations(),
            "layer_info": self.get_layer_info()
        }

        self._clear_hooks()

        return results

    def _process_activations(self) -> Dict:
        processed_activations = {}

        for layer_name, activation in self.activations.items():
            activation_np = activation.cpu().numpy()

            if activation_np.shape[0] == 1:
                activation_np = activation_np[0]  # [channels, height, width]

            processed_activations[layer_name] = {
                "shape": list(activation_np.shape),
                "num_channels": activation_np.shape[0],
                "spatial_size": [activation_np.shape[1], activation_np.shape[2]],
                "mean_activation": float(np.mean(activation_np)),
                "std_activation": float(np.std(activation_np)),
                "max_activation": float(np.max(activation_np)),
                "min_activation": float(np.min(activation_np)),
                "data": activation_np.tolist()
            }

        return processed_activations

Test Script

Now let’s test the functionality of model.py.

Setup

Create a new file called run_test.py inside the backend app directory:

backend/app/run_test.py

Also, download an image to use for testing. For this example, we’ll use a cat image sourced from theguardian.com, which you can open and preprocess with the CNN Visualizer.

Test Script Code

import PIL.Image as Image
from torchvision.models import ResNet18_Weights, ResNet50_Weights

from model import CNNVisualizer

# Initialize the CNN Visualizer with ResNet18
visualizer = CNNVisualizer("resnet18")

# Load and preprocess the test image
img = Image.open("cat.png")
tensor_img = visualizer.preprocess(img)

# Run prediction and capture activations
results = visualizer.predict_and_explain(tensor_img)

# Get class names from the model metadata
class_names = ResNet18_Weights.DEFAULT.meta["categories"]
top5_classes = results["prediction"]["top5_classes"]
top5_probs = results["prediction"]["top5_probabilities"]

# Finally, print the top 5 predictions with thier probabilities
for idx, (cls_id, prob) in enumerate(zip(top5_classes, top5_probs)):
    print(f"{idx+1}. {class_names[cls_id]} ({cls_id}): {prob*100:.2f}%")

Running the Test:

python run_test.py

Expected Output:

Loaded resnet18 model on cpu
Registered hooks for 20 convolutional layers
1. tiger cat (282): 87.59%
2. tabby (281): 9.10%
3. Egyptian cat (285): 1.67%
4. lynx (287): 1.04%
5. Persian cat (283): 0.15%

From the top 5 predictions, “tiger cat” with a probability of 87.59% correctly matches the input image of the cat, demonstrating that the model and preprocessing work as expected.

Next Steps (Log #4)

In log#4, we will focus on creating the backend API endpoints to serve our CNN Visualizer functionality. This includes:

Building an image upload endpoint to accept user images.
Adding an inference endpoint that runs predictions and returns activations.
Structuring JSON responses to be used on the frontend.
Integrating proper error handling and validation.

References

PyTorch Hook: https://docs.pytorch.org/docs/stable/generated/torch.nn.modules.module.register_module_forward_hook.html
Pytorch Models: https://docs.pytorch.org/vision/stable/models.html
PyTorch Documentation: https://pytorch.org/docs/stable/index.html

Log#3: CNN Visualizer - Part 2

Table of contents