Yet Even More About PyTorch Neural Network Weight Initialization

I’ve been working through the details of the PyTorch neural network library. I’m still examining basic concepts like weight and bias initialization. Even a task as simple as setting weights to some fixed value is surprisingly tricky.

Here’s example code that sets up a 4-7-3 NN (for the Iris Dataset problem):

# PyTorch 0.4.1 Anaconda3 5.2.0 (Python 3.6.5)

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.fc1 = T.nn.Linear(4, 7)  # 'fully connected'
    T.nn.init.xavier_uniform_(self.fc1.weight)
    T.nn.init.zeros_(self.fc1.bias)

    self.fc2 = T.nn.Linear(7, 3)
    T.nn.init.xavier_uniform_(self.fc2.weight)
    T.nn.init.uniform_(self.fc2.bias, -0.05, 0.05)

  def forward(self, x):
    x = T.tanh(self.fc1(x))
    x = self.fc2(x) 
    return x

There’s a lot going on in that code. I was experimenting by setting hard-coded weight values, for example:

for j in range(7):
  for i in range(4):
    self.fc1.weight[j][i] = 0.5555  # errors

But this will throw an error. The fc1.weight object is type Parameter. But this code works:

for j in range(7):
  for i in range(4):
    self.fc1.weight.data[j][i] = 0.5555 # OK

The fc1.weight.data is a base Tensor object. A very helpful expert (“ptrblck”) on the PyTorch discussion forum recommended:

with T.no_grad():
  for j in range(7):
    for i in range(4):
      self.fc1.weight[j][i] = 0.5555  # OK

Wow. I’ve used many code libraries before and trust me, this is tricky stuff. The moral of the story is that in order to be successful at writing code, you have to relentlessly pay attention to very tiny details. Thinking in terms of the big picture just doesn’t work when you’re implementing code.

I used to enjoy reading the daily “Herman” cartoons in newspapers by Canadian cartoonist Jim Unger (1937 – 2012). The series ran from 1975 to 1992.