A PyTorch Convolution Layer Worked Example

One of my job responsibilities is to teach engineers and data scientists how to use the PyTorch neural network code library. There are many examples of how convolution works, but they tend to be too generic (not specific to PyTorch), or too specific (a very low-level explanation of the library functions).

Here’s an example that I use. The demo sets up an input of a simple 4×4 grayscale (1 channel) image with dummy pixel values 1 through 16. The demo sets up a convolution layer with a 2×2 kernel/weights of [[0.5, 0.6],[0.7,0.8]] and a bias of 1.0.

The diagram shows how sending the input to the convolution layer results in a 3×3 array of numbers. These values don’t have any direct interpretation — they’re just an abstract representation of the image. This abstract representation would be fed to other parts of a neural network. Training the network would give values of the convolution weights that best represent the image for classification or whatever the neural network is trained to do.

The demo leaves out a ton of optional details. For example, the kernel shifts one column to the right, and then when it hits the end of the rows, it shift one row down. This (1,1) is the stride movement. There are other details, like padding, but the point of my demo is to explain how PyTorch convolution works, not to dive into the details.

The word “convolutional” means “a thing that is complex and difficult to follow”. That definition applies well to convolutional neural networks. I think the description also applies to synthetic biology. Left: The major obstacle for storing information in DNA is synthesis — reading information from DNA is not as big a challenge. Right: A mouse gut that has been colonized with multiple engineered strains of Bacteroides labeled with different colors.

Demo code:

# convolution_demo.py
# PyTorch 1.10.0-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10 

import numpy as np
import torch as T
device = T.device('cpu')

# -----------------------------------------------------------

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()  # pre Python 3.3 syntax
    self.conv1 = T.nn.Conv2d(1, 1, 2)  # chnl-in, out, krnl
    wts = T.tensor([[[[0.5,0.6],[0.7,0.8]]]], dtype=T.float32)
    self.conv1.weight = T.nn.Parameter(wts)
    self.conv1.bias = T.nn.Parameter(T.tensor([1.0],
      dtype=T.float32))
    
  def forward(self, x):
    z = self.conv1(x)
    return z

# -----------------------------------------------------------

print("\nBegin PyTorch convolution demo ")

print("\nCreating net with one conv layer ")
net = Net().to(device)

print("\nConvolution layer wts: ")
print(net.conv1.weight)
print("Convolution layer bias: ")
print(net.conv1.bias)

x = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
x = x.reshape(1,1,4,4)  # bs, chnls, rows, cols
x = T.tensor(x, dtype=T.float32).to(device)
print("\nInput x is: ")
print(x)

print("\nSending x to net ")
z = net(x)

print("\nOutput is: ")
print(z)

print("\nEnd convolution demo ")

This entry was posted in PyTorch. Bookmark the permalink.

1 Response to A PyTorch Convolution Layer Worked Example

Thorsten Kleppe says:

March 15, 2022 at 5:10 am

In some sense, I think of a convolutional neural network simplified as a neural network. Actually, a convolutional network is an additional system attached to a neural network. But if you know a neural network, you can imagine the CNN in a similar way. By seeing the input neuron as the input feature map, connected with a filter in the form of several weights instead of just one, which generate the output neuron, which in the case of the CNN represents the output feature map. Which at some point ends up as a flatten input layer for the neural network.

Complicated systems are best made simple. And if you understand how to build layer-wise, you can create your network over several layers and simply run forwards and backwards through the CNN.

This makes the operation very simple like this for the forward pass:
outputMap += kernel * inputMap

So if the construction is right now wrt activations, we can just go back for backpropagation:
kernelDerivative += inputMap * outputGradientMap

The update then just like this:
kernel += kernelDerivative * learningRate

Rotating the kernel or filter by 180° is often recommended, but is not necessary as far as I know, so the CNN will also work fine without. But if you think of the kernel as a line connected from the input map to the output map, you can maybe imagine why you should flip the kernel in backpropagation when you go back. I did not say anything about pooling. Some recommendations go in the direction of setting the stride to 2 during convolution and thus achieving a more direct form of pooling.

Even if CNN’s are far more difficult to realize than I claim here. It is possible to build them this way step by step without being overwhelmed by all the information about CNN’s. Also for people like me with non-existent mathematics knowledge.

What I find exciting here is the idea that by following our “how to build deep learning systems” manual, we might be able to build systems that are completely unimaginable today. But you know your recipe, you know how to go through it forwards and backwards and then you just have to do the update process somehow working and it lives.

Loading...