Four Syntax Mysteries in a PyTorch Neural Network Definition

I work at a large tech company. One of my job responsibilities to to teach software engineers how to understand and implement neural networks, usually with PyTorch.

Experienced software engineers often have four questions about PyTorch neural network definition. They’re the same four questions I had when I was learning PyTorch. The four questions are related to 1.) class initialization with the __init__() method, 2.) weight and bias initialization, 3.) why there is no softmax() call in the forward() method, 4.) how the forward() method gets called.

Suppose you define a 4-7-3 single hidden layer neural network for the Iris dataset — 4 input values (sepal length and width, petal length and width), one hidden layer with 7 nodes, and 3 output nodes (setosa, versicolor, virginica).

import torch as T

class NeuralNet(T.nn.Module):
  def __init__(self):
    super(NeuralNet, self).__init__()  # Python 3.2 and earlier
    self.hid1 = T.nn.Linear(4, 7)  # 4-7-3
    self.oupt = T.nn.Linear(7, 3)

 def forward(self, x):
    z = T.tanh(self.hid1(x))  # or T.nn.functional.tanh
    z = self.oupt(z)  # no softmax
    return z

As briefly as possible, omitting pages of details:

1.) The NeuralNet class inherits from the T.nn.Module class. The super().__init__() statement calls the Module class __init__(). Passing the NeuralNet and self arguments is required syntax for Python 3.2 and earlier. In Python 3.3 and later you can use a mystery shortcut syntax and drop the arguments: super().__init__(). I prefer the older, longer syntax.

2.) The weights and biases are initialized by default using very complex (conceptually) code. If you added this weight and bias initialization code, the NeuralNet class will behave exactly the same:

import torch as T

class NeuralNet(T.nn.Module):
  def __init__(self):
    super(NeuralNet, self).__init__()  # Python 3.2 and earlier
    self.hid1 = T.nn.Linear(4, 7)  # 4-7-3
    self.oupt = T.nn.Linear(7, 3)

    # explicit weight init 

    T.nn.init.kaiming_uniform_(self.hid1.weight,
      a=math.sqrt(5.0))
    bound = 1 / math.sqrt(4)
    T.nn.init.uniform_(self.hid1.bias, -bound, bound)

    T.nn.init.kaiming_uniform_(self.hid2.weight, 
      a=math.sqrt(5.0))
    bound = 1 / math.sqrt(7)
    T.nn.init.uniform_(self.hid2.bias, -bound, bound)

    T.nn.init.kaiming_uniform_(self.oupt.weight, 
      a=math.sqrt(5.0))
    bound = 1 / math.sqrt(3)
    T.nn.init.uniform_(self.oupt.bias, -bound, bound)

  def forward(self, x):
    z = T.tanh(self.hid1(x))
    z = self.oupt(z)  # no softmax
    return z

For experimentation, I use the shorter default mechanism, but for production I use the longer explicit approach.

3.) In most non-PyTorch neural network libraries, for multi-class classification you apply softmax() to the output nodes and later use cross entropy error during training:

def forward(self, x):
  z = T.tanh(self.hid1(x))
  # z = T.softmax(self.oupt(z), dim=1)  # wrong!
  z = self.oupt(z)  # no softmax -- correct
  return z

But in PyTorch, if you use torch.nn.CrossEntropyLoss() for training, softmax() is automatically and invisibly applied. Note that for binary classification, regression, or when using mean squared error for training, in most cases you must apply output node activation.

4.) When calling a NeuralNet object, if you don’t specify a method, a default __call__() method is invoked, which in turn calls the forward() method. For example:

import torch as T
device = T.device("cpu")

net = NeuralNet().to(device)  # create neural network
X = T.tensor(np.array([[6.1, 3.1, 5.1, 1.1]])).to(device)
logits = net(X)  # call forward()

The logits = net(X) invisibly calls the net.__call__() method which is defined in the parent Module class. The __call__() method invisibly calls the forward() method. Very confusing for guys who are new to PyTorch.

Two Additional Mysteries

There are two other, slightly less common, mysteries that pop up when I do PyTorch training classes at my workplace.

5.) When defining a neural network, you can use either a Module from torch.nn (such as torch.nn.Tanh) or an equivalent function from torch.nn.functional (such as torch.nn.functional.tanh). You define a Module object in the __init__() method and then use the object in the forward() method. Or you just use the function version directly in the forward() method. Why are there two ways? Just to provide more flexibility at the expense of confusion for beginners. And to make matters more confusing, there are some equivalent function in the root torch module (torch.tanh).

import torch as T
def forward(self, x):
  # z = T.tanh(self.hid1(x))  OR
  # z = T.nn.functional.tanh(self.hid1(x))  OR
  # z = self.tan_act(self.hid1(x))  where
  #   self.tan_act = T.nn.Tanh() is in __init__()  OR
  # z = T.nn.Tanh()(self.hid1(x))  # OK but inefficient
  . . .

6.) You can define a PyTorch neural network using a class that derives from torch.nn.Module, as in the examples above. Or you can use a completely different shortcut syntax approach using the Sequential technique. The class definition approach is more flexible; the Sequential approach is sometimes useful when you need a simple NN defined inside a more complex neural system. See my post at https://jamesmccaffreyblog.com/2020/06/02/pytorch-sequential-vs-module-approaches-for-creating-a-neural-network/.

I’ve left out massive amounts of details, but this is enough to satisfy most of the syntax mystery questions I get during my training classes at work.

Mysterious supermarket signage.