Bottom line: Hyperparameter random search can be effective but the difficult part is determining what to parameterize and the range of possible parameter values.
When creating a neural network prediction model there are many architecture hyperparameters (number hidden layers, number of nodes in each hidden layer, hidden activation functions, initialization algorithms and their parameters, etc., etc.) And then there are dozens of training hyperparameters (optimization algorithm, learning rate, momentum, batch size, number training epochs, etc.)

In this demo, the best parameters were in trial 2 with 16 hidden nodes, tanh hidden activation, Adam optimization, learn rate = 0.01809, batch size = 14, and max_epochs = 799
Most of my colleagues and I use a manual approach for finding good hyperparameters. We use our experience and intuition. It’s possible to programmatically search for good hyperparameters. Somewhat surprisingly, a random search of hyperparameter values is highly effective compared to more sophisticated techniques, grid search in particular. See “Random Search for Hyper-Parameter Optimization” (2012) by J. Bergstra and Y. Bengio.
I put together a demo of hyperparameter random search. My demo problem is to predict a person’s political leaning (conservative, moderate, liberal) from sex, age, state, and income. In pseudo-code the idea is:
# loop n times # create random arch and train hyperparams # use arch params to create net # use train params to train net # evaluate trained net # log params and eval metric to file # end-loop # analyze log offline
I used just two architecture parameters: number of hidden nodes and hidden activation function. The architecture had fixed two hidden layers.
I used just four training parameters: optimization algorithm, learning rate, batch size, and max epochs. Here’s my demo function that generates random hyperparameters:
def create_params(seed=1): # n_hid, activation; opt, lr, bs, max_ep rnd = np.random.RandomState(seed) n_hid = rnd.randint(6, 21) # [6, 20] activation = ['tanh', 'relu'][rnd.randint(0,2)] opt = ['sgd', 'adam'][rnd.randint(0,2)] lr = rnd.uniform(low=0.001, high=0.10) bs = rnd.randint(6, 16) max_ep = rnd.randint(200, 1000) return (n_hid, activation, opt, lr, bs, max_ep)
The number of hidden nodes varies from 6 to 20, the learning rate varies from 0.001 to 0.01, and so on. Where do these ranges come from? Just guesses based on experience.
There are dozens of details, such as how to evaluate a trained network.
So, hyperparameter search isn’t a magic wand — you have to use experience to determine which of the hundreds of possible parameters to search, and which of the literally infinite ranges for parameter values to use.
One of the disadvantages of random search is that you can get ugly results, such as a learning rate of 0.10243568790223344556677123. One way to deal with this issue is to round floating point values to three decimals and integers to a power of 10 before trying them.

Like many of the older guys I work with, I gained a love of reading from the juvenile “Hardy Boys” mystery series. Several of the books featured a search for something, such as a treasure of some kind. Left: “The Tower Treasure” (#1, 1959 edition). Center: “Hunting for Hidden Gold (#5, 1963 edition). Right: “The Secret of Pirates’ Hill” (#36, 1956 edition). All three covers by artist Rudy Nappi (1923-2015).
Demo code.
# people_hyperparam_search.py
# predict politics type from sex, age, state, income
# PyTorch 1.12.1-CPU Anaconda3-2020.02 Python 3.7.6
# Windows 10/11
import numpy as np
import torch as T
device = T.device('cpu') # apply to Tensor or Module
# -----------------------------------------------------------
class PeopleDataset(T.utils.data.Dataset):
# sex age state income politics
# -1 0.27 0 1 0 0.7610 2
# +1 0.19 0 0 1 0.6550 0
# sex: -1 = male, +1 = female
# state: michigan, nebraska, oklahoma
# politics: conservative, moderate, liberal
def __init__(self, src_file):
all_xy = np.loadtxt(src_file, usecols=range(0,7),
delimiter="\t", comments="#", dtype=np.float32)
tmp_x = all_xy[:,0:6] # cols [0,6) = [0,5]
tmp_y = all_xy[:,6] # 1-D
self.x_data = T.tensor(tmp_x,
dtype=T.float32).to(device)
self.y_data = T.tensor(tmp_y,
dtype=T.int64).to(device) # 1-D
def __len__(self):
return len(self.x_data)
def __getitem__(self, idx):
preds = self.x_data[idx]
trgts = self.y_data[idx]
return preds, trgts # as a Tuple
# -----------------------------------------------------------
class Net(T.nn.Module):
def __init__(self, n_hid, activation):
super(Net, self).__init__()
self.hid1 = T.nn.Linear(6, n_hid) # 6-(nh-nh)-3
self.hid2 = T.nn.Linear(n_hid, n_hid)
self.oupt = T.nn.Linear(n_hid, 3)
if activation == 'tanh':
self.activ = T.nn.Tanh()
elif activation == 'relu':
self.activ = T.nn.ReLU()
T.nn.init.xavier_uniform_(self.hid1.weight)
T.nn.init.zeros_(self.hid1.bias)
T.nn.init.xavier_uniform_(self.hid2.weight)
T.nn.init.zeros_(self.hid2.bias)
T.nn.init.xavier_uniform_(self.oupt.weight)
T.nn.init.zeros_(self.oupt.bias)
def forward(self, x):
z = self.activ(self.hid1(x))
z = self.activ(self.hid2(z))
z = T.log_softmax(self.oupt(z), dim=1) # NLLLoss()
return z
# -----------------------------------------------------------
def overall_loss_3(model, ds, n_class):
# MSE using built in MSELoss() version
X = ds[0:len(ds)][0] # all X values
Y = ds[0:len(ds)][1] # all targets, ordinal form
YY = T.nn.functional.one_hot(Y, num_classes=n_class)
with T.no_grad():
oupt = T.exp(model(X)) # [all,3] probs form
loss_func = T.nn.MSELoss(reduction='sum')
loss_val = loss_func(oupt, YY) # a tensor
mse = loss_val / len(ds)
return mse # as tensor
# -----------------------------------------------------------
def accuracy(model, dataset):
# assumes model.eval()
X = dataset[0:len(dataset)][0]
# Y = T.flatten(dataset[0:len(dataset)][1])
Y = dataset[0:len(dataset)][1]
with T.no_grad():
oupt = model(X) # [40,3] logits
# (_, arg_maxs) = T.max(oupt, dim=1)
arg_maxs = T.argmax(oupt, dim=1) # argmax() is new
num_correct = T.sum(Y==arg_maxs)
acc = (num_correct * 1.0 / len(dataset))
return acc.item()
# -----------------------------------------------------------
def train(net, ds, opt, lr, bs, me, le):
train_ldr = T.utils.data.DataLoader(ds, batch_size=bs,
shuffle=True)
loss_func = T.nn.NLLLoss() # assumes log_softmax()
if opt == 'sgd':
optimizer = T.optim.SGD(net.parameters(), lr=lr)
elif opt == 'adam':
optimizer = T.optim.Adam(net.parameters(), lr=lr)
# else error
for epoch in range(0, me):
epoch_loss = 0.0 # for one full epoch
for (batch_idx, batch) in enumerate(train_ldr):
X = batch[0] # inputs
Y = batch[1] # correct class/label/politics
optimizer.zero_grad()
oupt = net(X)
loss_val = loss_func(oupt, Y) # a tensor
epoch_loss += loss_val.item() # accumulate
loss_val.backward()
optimizer.step()
if epoch % le == 0:
print("epoch = %5d | loss = %10.4f" % \
(epoch, epoch_loss))
print("Training done ")
return net
# -----------------------------------------------------------
def create_params(seed=1):
# n_hid, activation; opt, lr, bs, max_ep
rnd = np.random.RandomState(seed)
n_hid = rnd.randint(6, 21) # [6, 20]
activation = ['tanh', 'relu'][rnd.randint(0,2)]
opt = ['sgd', 'adam'][rnd.randint(0,2)]
lr = rnd.uniform(low=0.001, high=0.10)
bs = rnd.randint(6, 16)
max_ep = rnd.randint(200, 1000)
return (n_hid, activation, opt, lr, bs, max_ep)
# -----------------------------------------------------------
def search_params(ds):
# using Datset ds
# loop n times
# create random arch and train hyperparams
# use arch params to create net
# use train params to train net
# evaluate trained net
# log params and eval metric to file
# end-loop
# analyze log offline
max_trials = 6
for i in range(max_trials):
print("\nSearch trial " + str(i))
(n_hid, activation, opt, lr, bs, max_ep) = \
create_params(seed=i*2)
print((n_hid, activation, opt, lr, bs, max_ep))
net = Net(n_hid, activation).to(device)
net.train()
net = train(net, ds, opt, lr, bs, max_ep, le=200)
net.eval()
error = overall_loss_3(net, ds, n_class=3).item()
acc = accuracy(net, ds)
print("acc = %0.4f error = %0.4f " % (acc, error))
# log params, error, accuracy here
return 0
# -----------------------------------------------------------
def main():
# 0. get started
print("\nBegin People hyperparameter random search ")
T.manual_seed(1)
np.random.seed(1)
# 1. create Dataset objects
print("\nCreating People Datasets ")
train_file = ".\\Data\\people_train.txt"
train_ds = PeopleDataset(train_file) # 200 rows
test_file = ".\\Data\\people_test.txt"
test_ds = PeopleDataset(test_file) # 40 rows
# 2. search for good hyperparameters
search_params(train_ds)
# -----------------------------------------------------------
print("\nEnd People hyperparameter random search ")
if __name__ == "__main__":
main()
.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
You must be logged in to post a comment.