Bear with me for a moment — it’s difficult to explain the topic of this blog post. I’ll work backwards from the output of a demo program:
. . . End evolution Final best soln found: [9 6 4 9 5 4] Final best hyperparameters found: num hidden nodes = 20 hidden activation = relu batch size = 8 learn rate = 0.1200 max epochs = 500 optimizer = sgd Final best error = 0.1300 Final best weighted acc = 0.8700 Final best train acc = 0.9300 Final best test acc = 0.8500 End evolutionary parameter search
The problem is to programmatically find a good set of hyperparameters (number of hidden nodes, learning rate, etc.) for a neural network. The demo program used evolutionary optimization, and found values that give 93% accuracy on a set of test data and 85% accuracy on a set of training data — very good results.
For simple neural networks and datasets, it’s usually possible to manually search for a good set of hyperparameter values. But for complex neural networks (especially those with a Transformer component) and/or with large datasets, it’s usually necessary to programmatically search. Grid search and random search sometimes work well but I prefer using evolutionary optimization (EO):
create a population of N random solutions (hyperparam values) loop max_gen generations select two parent solutions make a child solution mutate child slightly replace a weak soln in population with new child end-loop return best solution found
There are many, many design decisions. My latest design incorporates functionality to prevent duplicate solutions from being evaluated (which, although conceptually simple, required an annoyingly large amount of code). A solution (set of hyperparameter values) looks like [9, 0, 2, 2, 4, 3]. For each solution, I created a string key like “902243” and added to a Dictionary collection with a dummy value of 1.
Also, I weighted solution goodness by using 3 times accuracy on test data compared to accuracy on training data (to avoid situations with near 100% accuracy on training data but only so-so accuracy on test data). For example, the best solution found in the demo had 0.85 accuracy on training data and 0.93 accuracy on test data so the weighted accuracy is ((3 * 0.85) + (1 * 0.93)) / 4 = 0.87.
My demo uses one of my standard synthetic datasets. The goal is to predict a person’s sex (male = 0, female = 1) from age, State of residence (Michigan, Nebraska, Oklahoma), annual income, and political leaning (conservative, moderate, liberal).
My demo code works quite well but it’s complex. Sometimes difficult problems require complex solutions.

There’s a huge amount of research that examines the many big differences between males and females. Women tend to be more sociable, sensitive, warm, compassionate, polite, anxious, self-doubting, and more open to aesthetics. Men tend to be more dominant, assertive, risk-prone, thrill-seeking, tough-minded, emotionally stable, utilitarian, and open to abstract ideas. In short, men prefer working mostly alone with things and abstract ideas (like computer programming) and women prefer working in groups with verbal communication (like administrative assistants). These sex differences appear starting at early childhood and are firmly entrenched by teen years. See blogs.scientificamerican.com/beautiful-minds/taking-sex-differences-in-personality-seriously/ for a nice summary of the research.
Demo code below. Replace “lt” (less-than) etc. with Boolean operator symbols. The training and test data are at https://jamesmccaffreyblog.com/2022/09/23/binary-classification-using-pytorch-1-12-1-on-windows-10-11/.
# people_gender_evo_hyperparams.py
# binary classification, evolutionary hyperparam search
# PyTorch 2.0.0-CPU Anaconda3-2022.10 Python 3.9.13
# Windows 10/11
import numpy as np
import torch as T
device = T.device('cpu') # apply to Tensor or Module
class PeopleDataset(T.utils.data.Dataset):
# sex age state income politics
# 0 0.27 0 1 0 0.7610 0 0 1
# 1 0.19 0 0 1 0.6550 1 0 0
# sex: 0 = male, 1 = female
# state: michigan, nebraska, oklahoma
# politics: conservative, moderate, liberal
def __init__(self, src_file):
all_data = np.loadtxt(src_file, usecols=range(0,9),
delimiter="\t", comments="#", dtype=np.float32)
self.x_data = T.tensor(all_data[:,1:9],
dtype=T.float32).to(device)
self.y_data = T.tensor(all_data[:,0],
dtype=T.float32).to(device) # float32 required
self.y_data = self.y_data.reshape(-1,1) # 2-D required
def __len__(self):
return len(self.x_data)
def __getitem__(self, idx):
feats = self.x_data[idx,:] # idx row, all 8 cols
sex = self.y_data[idx,:] # idx row, the only col
return feats, sex # as a Tuple
# -----------------------------------------------------------
class Net(T.nn.Module):
def __init__(self, n_hid, activ):
super(Net, self).__init__()
self.hid1 = T.nn.Linear(8, n_hid, activ) # like 8-(10-10)-1
self.hid2 = T.nn.Linear(n_hid, n_hid)
self.oupt = T.nn.Linear(n_hid, 1)
if activ == 'tanh':
self.activ = T.nn.Tanh()
elif activ == 'relu':
self.activ = T.nn.ReLU()
T.nn.init.xavier_uniform_(self.hid1.weight)
T.nn.init.zeros_(self.hid1.bias)
T.nn.init.xavier_uniform_(self.hid2.weight)
T.nn.init.zeros_(self.hid2.bias)
T.nn.init.xavier_uniform_(self.oupt.weight)
T.nn.init.zeros_(self.oupt.bias)
def forward(self, x):
z = self.activ(self.hid1(x))
z = self.activ(self.hid2(z))
z = T.sigmoid(self.oupt(z)) # for BCELoss()
return z
# -----------------------------------------------------------
def accuracy_q(model, ds):
inpts = ds[:][0] # all input rows
targets = ds[:][1] # all targets 0s and 1s
with T.no_grad():
oupts = model(inpts) # all computed ouputs
pred_y = oupts "gte" 0.5 # tensor of 0s and 1s
num_correct = T.sum(targets==pred_y)
acc = (num_correct.item() * 1.0 / len(ds)) # scalar
return acc
# -----------------------------------------------------------
def metrics(model, ds, thresh=0.5):
# note: N = total number of items = TP + FP + TN + FN
# accuracy = (TP + TN) / N
# precision = TP / (TP + FP)
# recall = TP / (TP + FN)
# F1 = 2 / [(1 / precision) + (1 / recall)]
tp = 0; tn = 0; fp = 0; fn = 0
for i in range(len(ds)):
inpts = ds[i][0] # dictionary style
target = ds[i][1] # float32 [0.0] or [1.0]
target = target.int() # int 0 or 1
with T.no_grad():
p = model(inpts) # between 0.0 and 1.0
# should really avoid 'target == 1.0'
if target == 1 and p "gte" thresh: # TP
tp += 1
elif target == 1 and p "lt" thresh: # FP
fn += 1
elif target == 0 and p "lt" thresh: # TN
tn += 1
elif target == 0 and p "gte" thresh: # FN
fp += 1
N = tp + fp + tn + fn
if N != len(ds):
print("FATAL LOGIC ERROR in metrics()")
accuracy = (tp + tn) / (N * 1.0)
precision = (1.0 * tp) / (tp + fp) # tp + fp != 0
recall = (1.0 * tp) / (tp + fn) # tp + fn != 0
f1 = 2.0 / ((1.0 / precision) + (1.0 / recall))
return (accuracy, precision, recall, f1) # as a Tuple
# -----------------------------------------------------------
def train(net, ds, bs, lr, me, opt='sgd', verbose=False):
# dataset, bat_size, lrn_rate, max_epochs, optimizer
v = verbose
train_ldr = T.utils.data.DataLoader(ds, batch_size=bs,
shuffle=True)
loss_func = T.nn.BCELoss() # sigmoid activation
if opt == 'sgd':
optimizer = T.optim.SGD(net.parameters(), lr=lr)
elif opt == 'adam':
optimizer = T.optim.Adam(net.parameters(), lr=lr)
if v: print("\nStarting training ")
le = me // 4 # log interval: n log prints
for epoch in range(0, me):
epoch_loss = 0.0 # for one full epoch
for (batch_idx, batch) in enumerate(train_ldr):
X = batch[0] # inputs
y = batch[1] # correct class/label/politics
optimizer.zero_grad()
oupt = net(X)
loss_val = loss_func(oupt, y) # a tensor
epoch_loss += loss_val.item() # accumulate
loss_val.backward()
optimizer.step()
if v:
if epoch % le == 0:
print("epoch = %5d | loss = %10.4f" % \
(epoch, epoch_loss))
if v: print("Done ")
# -----------------------------------------------------------
def evaluate(soln, trn_ds, tst_ds, verbose=False):
# compute the meta error of a soln
# [n_hid, activ, bs, lr, me, opt]
# [0] [1] [2] [3] [4] [5]
v = verbose
n_hids = [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
activs = ['tanh', 'tanh','tanh','tanh','tanh',
'relu', 'relu', 'relu', 'relu', 'relu']
b_szs = [1, 2, 4, 6, 8, 10, 12, 14, 16, 20]
rates = [0.001, 0.005, 0.008, 0.01, 0.02, 0.03, 0.05,
0.08, 0.10, 0.12]
max_eps = [50, 100, 200, 300, 400, 500, 600, 700, 800, 1000]
opts = ['sgd', 'sgd', 'sgd', 'sgd', 'sgd',
'adam', 'adam', 'adam', 'adam', 'adam']
n_hid = n_hids[soln[0]]
activ = activs[soln[1]]
bs = b_szs[soln[2]]
lr = rates[soln[3]]
me = max_eps[soln[4]]
opt = opts[soln[5]]
T.manual_seed(1) # prepare
np.random.seed(1)
net = Net(n_hid, activ).to(device) # create NN
net.train()
train(net, trn_ds, bs, lr, me, opt, verbose) # train NN
net.eval()
acc_train = accuracy_q(net, trn_ds) # evaluate NN accuracy
acc_test = accuracy_q(net, tst_ds)
acc_weighted = ((1 * acc_train) + (3 * acc_test)) / 4
error = 1.0 - acc_weighted # [0.0, 1.0]
if v: print("train acc = %0.4f " % acc_train)
if v: print("test_acc = %0.4f " % acc_test)
return (acc_train, acc_test, error)
# -----------------------------------------------------------
def show_soln_to_hyperparams(soln):
n_hids = [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
activs = ['tanh', 'tanh','tanh','tanh','tanh',
'relu', 'relu', 'relu', 'relu', 'relu']
b_szs = [1, 2, 4, 6, 8, 10, 12, 14, 16, 20]
rates = [0.001, 0.005, 0.008, 0.01, 0.02, 0.03, 0.05,
0.08, 0.10, 0.12]
max_eps = [50, 100, 200, 300, 400, 500, 600, 700, 800, 1000]
opts = ['sgd', 'sgd', 'sgd', 'sgd', 'sgd',
'adam', 'adam', 'adam', 'adam', 'adam']
n_hid = n_hids[soln[0]]
activ = activs[soln[1]]
bs = b_szs[soln[2]]
lr = rates[soln[3]]
me = max_eps[soln[4]]
opt = opts[soln[5]]
print("num hidden nodes = " + str(n_hid))
print("hidden activation = " + str(activ))
print("batch size = " + str(bs))
print("learn rate = %0.4f " % lr)
print("max epochs = " + str(me))
print("optimizer = " + str(opt))
# -----------------------------------------------------------
def make_random_soln(dim, rnd):
soln = rnd.randint(low=0, high=10, size=dim, dtype=int)
return soln
def mutate(child_soln, mutate_prob, rnd):
for k in range(len(child_soln)):
q = rnd.random() # [0.0, 1.0]
if q "lt" mutate_prob:
child_soln[k] = rnd.randint(0, 10, size=1, dtype=int)
return
# -----------------------------------------------------------
def main():
# 0. get started
print("\nPeople gender evolutionary hyperparam search ")
T.manual_seed(1)
np.random.seed(1)
rnd = np.random.RandomState(1) # controls initial pop
# 1. create Dataset objects
print("\nCreating People train and test Datasets ")
train_file = ".\\Data\\people_train.txt"
test_file = ".\\Data\\people_test.txt"
train_ds = PeopleDataset(train_file) # 200 rows
test_ds = PeopleDataset(test_file) # 40 rows
# 2. create pop. of possible solutions (hyperparams)
pop_size = 10
dim = 6 # number of hyperparameters to examine
max_gen = 12
used = {} # set of hyperparams that have been evaluated
print("\nCreating population of " + \
str(pop_size) + " random possible solns ")
pop = [] # list of tuples, tuple is (np arr, float)
for i in range(pop_size): # each set of hyperparams
soln = make_random_soln(dim, rnd)
soln_key = "".join(str(x) for x in soln)
while soln_key in used:
soln = make_random_soln(dim, rnd)
soln_key = "".join(str(x) for x in soln)
used[soln_key] = 1
trn_acc, tst_acc, err = evaluate(soln, train_ds, test_ds,
verbose=True)
pop.append((soln, err))
pop = sorted(pop, key=lambda tup:tup[1]) # low err to hi
# 3. find best set of initial hyperparams
best_soln = pop[0][0].copy()
best_err = pop[0][1]
print("\nBest initial soln: ")
print(best_soln)
print("Best initial weighted error = %0.4f " % best_err)
print("Best initial weighted acc = %0.4f " % (1-best_err))
# -----------------------------------------------------------
# 4. evolve
print("\nBegin evolution ")
for gen in range(max_gen):
print("\ngeneration = " + str(gen))
# 4a. pick two parents
first = rnd.randint(0, pop_size // 2) # good one
second = rnd.randint(pop_size // 2, pop_size) # weaker
flip = rnd.randint(2) # 0 or 1
if flip == 0:
parent_idxs = (first, second)
else:
parent_idxs = (second, first)
# 4b. create child
child_soln = np.zeros(dim, dtype=int)
i = parent_idxs[0]; j = parent_idxs[1]
parent1 = pop[i][0]
parent2 = pop[j][0]
for k in range(0, dim // 2): # left half
child_soln[k] = parent1[k]
for k in range(dim // 2, dim): # right half
child_soln[k] = parent2[k]
# 4c. mutate child, avoid duplicate
mutate_prob = 0.5
mutate(child_soln, mutate_prob, rnd)
child_soln_key = "".join(str(x) for x in child_soln)
while child_soln_key in used:
mutate(child_soln, mutate_prob, rnd)
child_soln_key = "".join(str(x) for x in child_soln)
used[child_soln_key] = 1
trn_acc, tst_acc, child_err = evaluate(child_soln,
train_ds, test_ds, verbose=True)
print(child_soln)
print("child err = %0.4f " % child_err)
# 4d. is child new best soln?
if child_err "lt" best_err:
print("New best soln found at gen " + str(gen))
best_soln = child_soln.copy()
best_err = child_err
else:
# print("No improvement at gen " + str(gen))
pass
# 4e. replace weak pop soln with child
idx = rnd.randint(pop_size // 2, pop_size)
pop[idx] = (child_soln, child_err) # Tuple
# 4f. sort solns from best (low error) to worst
pop = sorted(pop, key=lambda tup:tup[1])
print("\nEnd evolution ")
# -----------------------------------------------------------
# 5. show best hyperparameters found
print("\nFinal best soln found: ")
print(best_soln)
print("\nFinal best hyperparameters found: ")
show_soln_to_hyperparams(best_soln)
print("\nFinal best error = %0.4f " % best_err)
print("Final best weighted acc = %0.4f " % (1-best_err))
train_acc, test_acc, _ = evaluate(best_soln, train_ds, test_ds)
print("\nFinal best train acc = %0.4f " % train_acc)
print("Final best test acc = %0.4f " % test_acc)
print("\nEnd evolutionary parameter search ")
# -----------------------------------------------------------
# 5. TODO: save model
if __name__== "__main__":
main()

.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2025 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2025 G2E Conference
2025 iSC West Conference
You must be logged in to post a comment.