The Effect of the PyTorch Random Seed Value

One of the dirty little secrets of machine learning research is that in order to get a paper published, it’s almost always necessary to demonstrate improved results of some sort. And by setting the global random number seed to many different values, researchers can significantly adjust experimental results.

Setting the random seed typically has two major effects. First, the seed controls the initial values of a network weights and biases. Second, the seed controls the order in which training data is processed by a DataLoader object.

I put together a short demo experiment to illustrate the point. I used one of my standard multi-class classification demos. I used six different seed values (0, 3, 5, 363, 366, 999) to create and train a neural classifier. In pseudo-code:

loop many times
  set random seed value
  reload datasets
  create net (seed controls initial wts)
  train net (seed controls processing order)
  compute overall accuracy, error
  log results
end-loop

Even with a tiny demo dataset of just 200 training items, classification accuracy ranged from 68.50% to 86.00% — a very wide range.

For complex neural systems, such as convolutional NNs for image classification or transformer architecture for natural language processing, the effect of the random number seed can be very large. See the paper “Torch.manual_seed(3407) is All You Need: On the Influence of Random Seeds in Deep Learning Architectures for Computer Vision” by D. Picard.

In research, the correct way to deal with the effect of the random seed is to run an experiment using different seed values then average the results.

I used to play a lot of golf with my pal Paul Ruiz when I lived in California and a.) had a lot of time, b.) had a lot of sunny weather. Now that I’m a.) older and have zero free time, b.) live in rainy Washington, my golf is limited to arcade games like this Williams Mini Golf from the mid 1960s. When I played real golf, my putting was pretty good put my driving was more like a random seed process.

Demo code. The data can be found at https://jamesmccaffreyblog.com/2022/09/01/multi-class-classification-using-pytorch-1-12-1-on-windows-10-11/.

# people_seed_effect.py
# predict politics type from sex, age, state, income
# effect of different random seed values

# PyTorch 1.12.1-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10/11 

import numpy as np
import torch as T
device = T.device('cpu')  # apply to Tensor or Module

# -----------------------------------------------------------

class PeopleDataset(T.utils.data.Dataset):
  # sex  age    state    income   politics
  # -1   0.27   0  1  0   0.7610   2
  # +1   0.19   0  0  1   0.6550   0
  # sex: -1 = male, +1 = female
  # state: michigan, nebraska, oklahoma
  # politics: conservative, moderate, liberal

  def __init__(self, src_file):
    all_xy = np.loadtxt(src_file, usecols=range(0,7),
      delimiter="\t", comments="#", dtype=np.float32)
    tmp_x = all_xy[:,0:6]   # cols [0,6) = [0,5]
    tmp_y = all_xy[:,6]     # 1-D

    self.x_data = T.tensor(tmp_x, 
      dtype=T.float32).to(device)
    self.y_data = T.tensor(tmp_y,
      dtype=T.int64).to(device)  # 1-D

  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    preds = self.x_data[idx]
    trgts = self.y_data[idx] 
    return preds, trgts  # as a Tuple

# -----------------------------------------------------------

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.hid1 = T.nn.Linear(6, 10)  # 6-(10-10)-3
    self.hid2 = T.nn.Linear(10, 10)
    self.oupt = T.nn.Linear(10, 3)

    T.nn.init.xavier_uniform_(self.hid1.weight)
    T.nn.init.zeros_(self.hid1.bias)
    T.nn.init.xavier_uniform_(self.hid2.weight)
    T.nn.init.zeros_(self.hid2.bias)
    T.nn.init.xavier_uniform_(self.oupt.weight)
    T.nn.init.zeros_(self.oupt.bias)

  def forward(self, x):
    z = T.tanh(self.hid1(x))
    z = T.tanh(self.hid2(z))
    z = T.log_softmax(self.oupt(z), dim=1)  # NLLLoss() 
    return z

# -----------------------------------------------------------

def accuracy(model, dataset):
  # assumes model.eval()
  X = dataset[0:len(dataset)][0]
  Y = dataset[0:len(dataset)][1]
  with T.no_grad():
    oupt = model(X)  #  [40,3]  logits
  arg_maxs = T.argmax(oupt, dim=1)  # argmax() is new
  num_correct = T.sum(Y==arg_maxs)
  acc = (num_correct * 1.0 / len(dataset))
  return acc.item()

# -----------------------------------------------------------

def overall_loss(model, ds, n_class):
  # MSE all-at-once version
  X = ds[0:len(ds)][0]  # all X values
  Y = ds[0:len(ds)][1]  # all targets, ordinal form
  with T.no_grad():
    oupt = T.exp(model(X))  #  pseudo-probs form
 
  YY = T.nn.functional.one_hot(Y, num_classes=n_class)
  delta = YY - oupt
  delta_sq = T.multiply(delta, delta)  # not dot()
  sum_sq = T.sum(delta_sq, dim=1)  # process rows
  mse = T.mean(sum_sq)
  return mse

# -----------------------------------------------------------

def train(net, ds, opt, lr, bs, me):
  train_ldr = T.utils.data.DataLoader(ds, batch_size=bs,
    shuffle=True)  
  loss_func = T.nn.NLLLoss()  # assumes log_softmax()
  if opt == 'sgd':
    optimizer = T.optim.SGD(net.parameters(), lr=lr)
  elif opt == 'adam':
    optimizer = T.optim.Adam(net.parameters(), lr=lr)
  # else error

  for epoch in range(0, me):
    for (batch_idx, batch) in enumerate(train_ldr):
      X = batch[0]  # inputs
      Y = batch[1]  # correct class/label/politics
      optimizer.zero_grad()
      oupt = net(X)
      loss_val = loss_func(oupt, Y)  # a tensor
      loss_val.backward()
      optimizer.step()
  return net 

# -----------------------------------------------------------

def main():
  print("\nBegin demo effects of random seed \n")
  seeds = np.array([0, 3, 5, 363, 366, 999], dtype=np.int64)
  for i in range(len(seeds)):
    print("======================== ")
    # 0. set random seed
    seed = seeds[i]
    T.manual_seed(seed)
    np.random.seed(seed)
  
    # 1. create DataLoader objects
    print("Creating training Dataset ")
    train_file = ".\\Data\\people_train.txt"
    train_ds = PeopleDataset(train_file)  # 200 rows

    # 2. create network
    print("Creating 6-(10-10)-3 neural network ")
    net = Net().to(device)
    net.train()

    # 3. train model
    bat_size = 10
    max_epochs = 1000
    lrn_rate = 0.01
    print("Starting training . . . ", end="")
    train(net, train_ds, 'sgd', lrn_rate, bat_size,
      max_epochs)
    print("Done ")
    
    # 4. evaluate model loss and accuracy
    net.eval()
    acc_train = accuracy(net, train_ds) 
    loss_train = overall_loss(net, train_ds, n_class=3)
    
    # 5. log results 
    print("seed = %4d | acc = %0.4f  | loss = %0.4f " % \
      (seed, acc_train, loss_train))
  # end-loop each seed

  print("\nEnd demo")

# -----------------------------------------------------------

if __name__ == "__main__":
  main()

1 Response to The Effect of the PyTorch Random Seed Value

Thorsten Kleppe says:

January 5, 2023 at 5:36 am

An extremely insightful blog post that demonstrates the challenges of working with seeds on a concrete example. This example shows the problem more extreme than any example I have seen before, thank you. I would gladly read more about “the dirty little secrets of machine learning.

The next escalation step of this problem could be the usage of multiple cores.

https://github.com/grensen/multi-core/raw/main/figures/dotnetfiddle_floating_point_issue.png

Let’s assume that after this test, 10% of the weights are affected by this problem, which occurs all the time but becomes more problematic during batch training. This will lead to non-reproducible results when using more than one core.

If your test was expensive and I have the chance to use a GPU with thousands of cores, this problem may become even more common, causing me to question why my performance differs so hard from yours.

Floating point issues occur all the time, but on a single core they happen consistently, allowing us to achieve the same results. We can then find a sweet spot of optimal performance. However, when using multiple cores, it seems that we may obtain more inaccurate results, particularly when trying to reach a top performance

Presumably, the bad seed results would then be better, and the better seeds would probably be worse, rather in the direction of what the average accuracy would be. My solution is more training, even though it may not be the best.

Loading...