One of the dirty little secrets of machine learning research is that in order to get a paper published, it’s almost always necessary to demonstrate improved results of some sort. And by setting the global random number seed to many different values, researchers can significantly adjust experimental results.
Setting the random seed typically has two major effects. First, the seed controls the initial values of a network weights and biases. Second, the seed controls the order in which training data is processed by a DataLoader object.
I put together a short demo experiment to illustrate the point. I used one of my standard multi-class classification demos. I used six different seed values (0, 3, 5, 363, 366, 999) to create and train a neural classifier. In pseudo-code:
loop many times set random seed value reload datasets create net (seed controls initial wts) train net (seed controls processing order) compute overall accuracy, error log results end-loop
Even with a tiny demo dataset of just 200 training items, classification accuracy ranged from 68.50% to 86.00% — a very wide range.
For complex neural systems, such as convolutional NNs for image classification or transformer architecture for natural language processing, the effect of the random number seed can be very large. See the paper “Torch.manual_seed(3407) is All You Need: On the Influence of Random Seeds in Deep Learning Architectures for Computer Vision” by D. Picard.
In research, the correct way to deal with the effect of the random seed is to run an experiment using different seed values then average the results.

I used to play a lot of golf with my pal Paul Ruiz when I lived in California and a.) had a lot of time, b.) had a lot of sunny weather. Now that I’m a.) older and have zero free time, b.) live in rainy Washington, my golf is limited to arcade games like this Williams Mini Golf from the mid 1960s. When I played real golf, my putting was pretty good put my driving was more like a random seed process.
Demo code. The data can be found at https://jamesmccaffreyblog.com/2022/09/01/multi-class-classification-using-pytorch-1-12-1-on-windows-10-11/.
# people_seed_effect.py
# predict politics type from sex, age, state, income
# effect of different random seed values
# PyTorch 1.12.1-CPU Anaconda3-2020.02 Python 3.7.6
# Windows 10/11
import numpy as np
import torch as T
device = T.device('cpu') # apply to Tensor or Module
# -----------------------------------------------------------
class PeopleDataset(T.utils.data.Dataset):
# sex age state income politics
# -1 0.27 0 1 0 0.7610 2
# +1 0.19 0 0 1 0.6550 0
# sex: -1 = male, +1 = female
# state: michigan, nebraska, oklahoma
# politics: conservative, moderate, liberal
def __init__(self, src_file):
all_xy = np.loadtxt(src_file, usecols=range(0,7),
delimiter="\t", comments="#", dtype=np.float32)
tmp_x = all_xy[:,0:6] # cols [0,6) = [0,5]
tmp_y = all_xy[:,6] # 1-D
self.x_data = T.tensor(tmp_x,
dtype=T.float32).to(device)
self.y_data = T.tensor(tmp_y,
dtype=T.int64).to(device) # 1-D
def __len__(self):
return len(self.x_data)
def __getitem__(self, idx):
preds = self.x_data[idx]
trgts = self.y_data[idx]
return preds, trgts # as a Tuple
# -----------------------------------------------------------
class Net(T.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.hid1 = T.nn.Linear(6, 10) # 6-(10-10)-3
self.hid2 = T.nn.Linear(10, 10)
self.oupt = T.nn.Linear(10, 3)
T.nn.init.xavier_uniform_(self.hid1.weight)
T.nn.init.zeros_(self.hid1.bias)
T.nn.init.xavier_uniform_(self.hid2.weight)
T.nn.init.zeros_(self.hid2.bias)
T.nn.init.xavier_uniform_(self.oupt.weight)
T.nn.init.zeros_(self.oupt.bias)
def forward(self, x):
z = T.tanh(self.hid1(x))
z = T.tanh(self.hid2(z))
z = T.log_softmax(self.oupt(z), dim=1) # NLLLoss()
return z
# -----------------------------------------------------------
def accuracy(model, dataset):
# assumes model.eval()
X = dataset[0:len(dataset)][0]
Y = dataset[0:len(dataset)][1]
with T.no_grad():
oupt = model(X) # [40,3] logits
arg_maxs = T.argmax(oupt, dim=1) # argmax() is new
num_correct = T.sum(Y==arg_maxs)
acc = (num_correct * 1.0 / len(dataset))
return acc.item()
# -----------------------------------------------------------
def overall_loss(model, ds, n_class):
# MSE all-at-once version
X = ds[0:len(ds)][0] # all X values
Y = ds[0:len(ds)][1] # all targets, ordinal form
with T.no_grad():
oupt = T.exp(model(X)) # pseudo-probs form
YY = T.nn.functional.one_hot(Y, num_classes=n_class)
delta = YY - oupt
delta_sq = T.multiply(delta, delta) # not dot()
sum_sq = T.sum(delta_sq, dim=1) # process rows
mse = T.mean(sum_sq)
return mse
# -----------------------------------------------------------
def train(net, ds, opt, lr, bs, me):
train_ldr = T.utils.data.DataLoader(ds, batch_size=bs,
shuffle=True)
loss_func = T.nn.NLLLoss() # assumes log_softmax()
if opt == 'sgd':
optimizer = T.optim.SGD(net.parameters(), lr=lr)
elif opt == 'adam':
optimizer = T.optim.Adam(net.parameters(), lr=lr)
# else error
for epoch in range(0, me):
for (batch_idx, batch) in enumerate(train_ldr):
X = batch[0] # inputs
Y = batch[1] # correct class/label/politics
optimizer.zero_grad()
oupt = net(X)
loss_val = loss_func(oupt, Y) # a tensor
loss_val.backward()
optimizer.step()
return net
# -----------------------------------------------------------
def main():
print("\nBegin demo effects of random seed \n")
seeds = np.array([0, 3, 5, 363, 366, 999], dtype=np.int64)
for i in range(len(seeds)):
print("======================== ")
# 0. set random seed
seed = seeds[i]
T.manual_seed(seed)
np.random.seed(seed)
# 1. create DataLoader objects
print("Creating training Dataset ")
train_file = ".\\Data\\people_train.txt"
train_ds = PeopleDataset(train_file) # 200 rows
# 2. create network
print("Creating 6-(10-10)-3 neural network ")
net = Net().to(device)
net.train()
# 3. train model
bat_size = 10
max_epochs = 1000
lrn_rate = 0.01
print("Starting training . . . ", end="")
train(net, train_ds, 'sgd', lrn_rate, bat_size,
max_epochs)
print("Done ")
# 4. evaluate model loss and accuracy
net.eval()
acc_train = accuracy(net, train_ds)
loss_train = overall_loss(net, train_ds, n_class=3)
# 5. log results
print("seed = %4d | acc = %0.4f | loss = %0.4f " % \
(seed, acc_train, loss_train))
# end-loop each seed
print("\nEnd demo")
# -----------------------------------------------------------
if __name__ == "__main__":
main()

.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
An extremely insightful blog post that demonstrates the challenges of working with seeds on a concrete example. This example shows the problem more extreme than any example I have seen before, thank you. I would gladly read more about “the dirty little secrets of machine learning.
The next escalation step of this problem could be the usage of multiple cores.
https://github.com/grensen/multi-core/raw/main/figures/dotnetfiddle_floating_point_issue.png
Let’s assume that after this test, 10% of the weights are affected by this problem, which occurs all the time but becomes more problematic during batch training. This will lead to non-reproducible results when using more than one core.
If your test was expensive and I have the chance to use a GPU with thousands of cores, this problem may become even more common, causing me to question why my performance differs so hard from yours.
Floating point issues occur all the time, but on a single core they happen consistently, allowing us to achieve the same results. We can then find a sweet spot of optimal performance. However, when using multiple cores, it seems that we may obtain more inaccurate results, particularly when trying to reach a top performance
Presumably, the bad seed results would then be better, and the better seeds would probably be worse, rather in the direction of what the average accuracy would be. My solution is more training, even though it may not be the best.