Neural networks can create powerful prediction systems. But NNs have a lot of hyperparameters that must be specified and the values of the hyperparameters have a huge effect on the effectiveness of a NN. Architecture hyperparameters include number of hidden layers, number of nodes in each hidden layer, the presence or absence of dropout layers (and the associated dropout rates), the hidden layer activation function, the weight and bias initialization algorithm, and so on. Training hyperparameters include batch size, optimization algorithm, learning rate, and so on.
The number of combinations of hyperparameter values is, quite literally, infinite. When a machine learning engineer develops a neural system, he starts by manually experimenting with hyperparameter values, using experience and intuition as a guide. But in a production environment, searching for good hyperparameter values is often done programmatically. The two most common techniques are grid search and random search.
But I’ve always been a fan of using evolutionary optimization to search for good hyperparameter values. I put together a demo. I used one of my standard synthetic datasets where the goal is to predict a person’s political leaning (conservative = 0, moderate = 1, liberal = 2) based on sex (male = -1, female = +1), age (divided by 100), State (Michigan = 100, Nebraska = 010, Oklahoma =001), and income (divided by $100,000).
It’s not feasible to parameterize everything in a NN. For architecture, I set the number of hidden layers to 2 and parameterized the number of nodes (2 to 38), and the activation to tanh() or relu(). For training, I parameterized batch size, learning rate, maximum epochs, and optimizer (SGD or ReLU).
I encoded a set of hyperparameter values as an integer array with 6 cells, where each cell has value 0 to 9. For example, in my demo, a possible set of encoded hyperparameters looks like: [5 1 9 3 9 7]. This represents 22 hidden nodes, tanh() activation, batch size = 20, learning rate = 0.01, 1000 max epochs, and Adam optimization.
In very high level pseudo-code, evolutionary optimization is:
create a population of n random possible solutions loop g generations select 2 parents from population create a new child solution from parents mutate child solution check if child is new best solution replace a weak soln in population with child end-loop return best solution found
There are many design choices for evolutionary optimization, such as how to select 2 parents, how to combine parents to create a child, how to mutate, and so on. One obvious enhancement is to maintain a Dictionary object of potential solutions/hyperparameters that have been used, in order to prevent duplicated effort. Another easy design option to explore is the child creation mechanism by selecting a random crossover index instead of always using the middle index.
Hyperparameter search using evolutionary optimization isn’t magic, but for the type of work I do, EO is often highly effective. And super interesting!

I’ve always been fascinated by the history of the evolution of submarines. As a young man I built many scale models and submarines were one of my favorite things to draw.
Left: The USS Plunger (SS-2, “submersible ship #2”) was commissioned in 1903. It was arguably the first (barely) practical, modern submarine. It was 64 feet long, had a crew of 7, a single torpedo tube, and could submerge to approximately 75 feet.
Right: The USS Plunger (SSN-595) was commissioned in 1990. It was a nuclear powered attack submarine with fearsome capabilities. It was 278 feet long, had a crew of 100, four torpedo tubes, and could submerge to approximately 1,300 feet.
Demo code below. Replace “lt” and “lte” with Boolean operator symbols. Not thoroughly tested; do not use for production. Training and test data at https://jamesmccaffreyblog.com/2022/09/01/multi-class-classification-using-pytorch-1-12-1-on-windows-10-11/.
# people_evo_hyperparameter.py
# PyTorch 2.0.1-CPU Anaconda3-2022.10 Python 3.9.13
# Windows 10/11
import numpy as np
import torch as T
device = T.device('cpu') # apply to Tensor or Module
# -----------------------------------------------------------
class PeopleDataset(T.utils.data.Dataset):
# sex age state income politics
# -1 0.27 0 1 0 0.7610 2
# +1 0.19 0 0 1 0.6550 0
# sex: -1 = male, +1 = female
# state: michigan, nebraska, oklahoma
# politics: conservative, moderate, liberal
def __init__(self, src_file):
all_xy = np.loadtxt(src_file, usecols=range(0,7),
delimiter="\t", comments="#", dtype=np.float32)
tmp_x = all_xy[:,0:6] # cols [0,6) = [0,5]
tmp_y = all_xy[:,6] # 1-D
self.x_data = T.tensor(tmp_x,
dtype=T.float32).to(device)
self.y_data = T.tensor(tmp_y,
dtype=T.int64).to(device) # 1-D
def __len__(self):
return len(self.x_data)
def __getitem__(self, idx):
preds = self.x_data[idx]
trgts = self.y_data[idx]
return preds, trgts # as a Tuple
# -----------------------------------------------------------
class Net(T.nn.Module):
def __init__(self, n_hid, activ='tanh'):
super(Net, self).__init__()
self.hid1 = T.nn.Linear(6, n_hid) # 6-(nh-nh)-3
self.hid2 = T.nn.Linear(n_hid, n_hid)
self.oupt = T.nn.Linear(n_hid, 3)
if activ == 'tanh':
self.activ = T.nn.Tanh()
elif activ == 'relu':
self.activ = T.nn.ReLU()
# use default weight init
def forward(self, x):
z = self.activ(self.hid1(x))
z = self.activ(self.hid2(z))
z = T.log_softmax(self.oupt(z), dim=1) # NLLLoss()
return z
# -----------------------------------------------------------
def accuracy_quick(model, dataset):
# assumes model.eval()
X = dataset[0:len(dataset)][0]
Y = dataset[0:len(dataset)][1]
with T.no_grad():
oupt = model(X) # [40,3] logits
arg_maxs = T.argmax(oupt, dim=1) # argmax() is new
num_correct = T.sum(Y==arg_maxs)
acc = (num_correct * 1.0 / len(dataset))
return acc.item()
# -----------------------------------------------------------
def train(net, ds, bs, lr, me, opt='sgd', verbose=False):
# dataset, bat_size, lrn_rate, max_epochs, optimizer
v = verbose
train_ldr = T.utils.data.DataLoader(ds, batch_size=bs,
shuffle=True)
loss_func = T.nn.NLLLoss() # log_softmax() activation
if opt == 'sgd':
optimizer = T.optim.SGD(net.parameters(), lr=lr)
elif opt == 'adam':
optimizer = T.optim.Adam(net.parameters(), lr=lr)
if v: print("\nStarting training ")
le = me // 4 # log interval: 4 log prints
for epoch in range(0, me):
epoch_loss = 0.0 # for one full epoch
for (batch_idx, batch) in enumerate(train_ldr):
X = batch[0] # inputs
Y = batch[1] # correct class/label/politics
optimizer.zero_grad()
oupt = net(X)
loss_val = loss_func(oupt, Y) # a tensor
epoch_loss += loss_val.item() # accumulate
loss_val.backward()
optimizer.step()
if v:
if epoch % le == 0:
print("epoch = %5d | loss = %10.4f" % \
(epoch, epoch_loss))
if v: print("Done ")
# -----------------------------------------------------------
def evaluate(soln, trn_ds, tst_ds, verbose=False):
# [n_hid, activ, bs, lr, me, opt]
# [0] [1] [2] [3] [4] [5]
v = verbose
# map soln cell values to hyperparameter values
n_hid = (soln[0] * 4) + 2 # 2, 6, . . 38
if soln[1] "lte" 4: activ = 'tanh'
else: activ = 'relu'
bs = int( (2 * soln[2]) + 2 ) # 2, 4, . . 20
rates = [0.001, 0.005, 0.008, 0.01, 0.02, 0.03, 0.05,
0.08, 0.10, 0.12]
lr = rates[soln[3]]
me = (100 * soln[4]) + 100 # 100, 200, . . 1000
if soln[5] "lte" 4: opt = 'sgd'
else: opt = 'adam'
T.manual_seed(1) # prepare
np.random.seed(1)
net = Net(n_hid, activ).to(device) # create
net.train()
train(net, trn_ds, bs, lr, me, opt, verbose) # train
net.eval()
acc_train = accuracy_quick(net, trn_ds) # evaluate
acc_test = accuracy_quick(net, tst_ds)
acc_avg = (acc_train + acc_test) / 2
error = 1.0 - acc_avg # [0.0, 1.0]
if v: print("train acc = %0.4f " % acc_train)
if v: print("test_acc = %0.4f " % acc_test)
return error
# -----------------------------------------------------------
def show_soln_to_hyperparams(soln):
n_hid = (soln[0] * 4) + 2
if soln[1] "lte" 4: activ = 'tanh'
else: activ = 'relu'
bs = int( (2 * soln[2]) + 2 ) # 2, 4, . . 20
rates = [0.001, 0.005, 0.008, 0.01, 0.02, 0.03, 0.05,
0.08, 0.10, 0.12]
lr = rates[soln[3]]
me = (100 * soln[4]) + 100 # 100, 200, . . 1000
if soln[5] "lte" 4: opt = 'sgd'
else: opt = 'adam'
print("num hidden nodes = " + str(n_hid))
print("hidden activation = " + str(activ))
print("batch size = " + str(bs))
print("learn rate = %0.4f " % lr)
print("max epochs = " + str(me))
print("optimizer = " + str(opt))
# -----------------------------------------------------------
def main():
# 0. get started
print("\nBegin People politics evolutionary parameter search ")
T.manual_seed(1) # is reset in evaluate()
np.random.seed(1)
rnd = np.random.RandomState(1) # controls initial population
# 1. create Dataset objects
print("\nCreating People Datasets ")
train_file = ".\\Data\\people_train.txt"
train_ds = PeopleDataset(train_file) # 200 rows
test_file = ".\\Data\\people_test.txt"
test_ds = PeopleDataset(test_file) # 40 rows
# 2. create population of possible solutions (hyperparams)
pop_size = 8
dim = 6
print("\nCreating population of " + \
str(pop_size) + " possible solns ")
pop = [] # list of tuples, tuple is (np arr, float)
for i in range(pop_size): # soln-err / set of hyperparams
soln = rnd.randint(low=0, high=10, size=dim, dtype=int)
err = evaluate(soln, train_ds, test_ds, verbose=False)
pop.append((soln, err))
pop = sorted(pop, key=lambda tup:tup[1]) # low to hi
# 3. find best set of hyperparams
best_soln = pop[0][0].copy()
best_err = pop[0][1]
print("\nBest initial soln: ")
print(best_soln)
print("Best initial error = %0.4f " % best_err)
print("Best initial avg acc = %0.4f " % (1-best_err))
# -----------------------------------------------------------
# 4. evolve
print("\nBegin evolution ")
max_gen = 7
for gen in range(max_gen):
print("\ngeneration = " + str(gen))
# 4a. pick two parents and make a child
first = rnd.randint(0, pop_size // 2) # good one
second = rnd.randint(pop_size // 2, pop_size) # weaker
flip = rnd.randint(2) # 0 or 1
if flip == 0:
parent_idxs = (first, second)
else:
parent_idxs = (second, first)
# 4b. create child
child_soln = np.zeros(dim, dtype=int)
i = parent_idxs[0]; j = parent_idxs[1]
parent1 = pop[i][0]
parent2 = pop[j][0]
for k in range(0, dim // 2): # left half
child_soln[k] = parent1[k]
for k in range(dim // 2, dim): # right half
child_soln[k] = parent2[k]
# 4c. mutate child
mutate_prob = 0.5
for k in range(dim):
q = rnd.random() # [0.0, 1.0]
if q "lt" mutate_prob:
child_soln[k] = rnd.randint(0, 10, size=1,
dtype=int)
child_err = evaluate(child_soln, train_ds, test_ds,
verbose=True)
print(child_soln)
print("%0.4f " % child_err)
# -----------------------------------------------------------
# 4d. is child new best soln?
if child_err "lt" best_err:
print("New best soln found at gen " + str(gen))
best_soln = child_soln.copy()
best_err = child_err
else:
# print("No improvement at gen " + str(gen))
pass
# 4e. replace weak pop soln with child
idx = rnd.randint(pop_size // 2, pop_size)
pop[idx] = (child_soln, child_err) # Tuple
# 4f. sort solns from best to worst
pop = sorted(pop, key=lambda tup:tup[1])
print("\nEnd evolution ")
# -----------------------------------------------------------
# 5. show best hyperparameters found
# best_soln = pop[0][0].copy()
# best_err = pop[0][1]
print("\nFinal best soln found: ")
print(best_soln)
print("\nFinal best hyperparameters found: ")
show_soln_to_hyperparams(best_soln)
print("\nFinal best error = %0.4f " % best_err)
print("Final best avg acc = %0.4f " % (1-best_err))
print("\nEnd evolutionary parameter search ")
if __name__ == "__main__":
main()

.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
You must be logged in to post a comment.