Hyperparameter Search Using Evolutionary Optimization for a PyTorch Binary Classifier

Bear with me for a moment — it’s difficult to explain the topic of this blog post. I’ll work backwards from the output of a demo program:

. . .
End evolution

Final best soln found:
[9 6 4 9 5 4]

Final best hyperparameters found:
num hidden nodes = 20
hidden activation = relu
batch size = 8
learn rate = 0.1200
max epochs = 500
optimizer = sgd

Final best error = 0.1300
Final best weighted acc = 0.8700

Final best train acc = 0.9300
Final best test acc = 0.8500

End evolutionary parameter search

The problem is to programmatically find a good set of hyperparameters (number of hidden nodes, learning rate, etc.) for a neural network. The demo program used evolutionary optimization, and found values that give 93% accuracy on a set of test data and 85% accuracy on a set of training data — very good results.

For simple neural networks and datasets, it’s usually possible to manually search for a good set of hyperparameter values. But for complex neural networks (especially those with a Transformer component) and/or with large datasets, it’s usually necessary to programmatically search. Grid search and random search sometimes work well but I prefer using evolutionary optimization (EO):

create a population of N random solutions (hyperparam values)
loop max_gen generations
  select two parent solutions
  make a child solution
  mutate child slightly
  replace a weak soln in population with new child
end-loop
return best solution found

There are many, many design decisions. My latest design incorporates functionality to prevent duplicate solutions from being evaluated (which, although conceptually simple, required an annoyingly large amount of code). A solution (set of hyperparameter values) looks like [9, 0, 2, 2, 4, 3]. For each solution, I created a string key like “902243” and added to a Dictionary collection with a dummy value of 1.

Also, I weighted solution goodness by using 3 times accuracy on test data compared to accuracy on training data (to avoid situations with near 100% accuracy on training data but only so-so accuracy on test data). For example, the best solution found in the demo had 0.85 accuracy on training data and 0.93 accuracy on test data so the weighted accuracy is ((3 * 0.85) + (1 * 0.93)) / 4 = 0.87.

My demo uses one of my standard synthetic datasets. The goal is to predict a person’s sex (male = 0, female = 1) from age, State of residence (Michigan, Nebraska, Oklahoma), annual income, and political leaning (conservative, moderate, liberal).

My demo code works quite well but it’s complex. Sometimes difficult problems require complex solutions.

There’s a huge amount of research that examines the many big differences between males and females. Women tend to be more sociable, sensitive, warm, compassionate, polite, anxious, self-doubting, and more open to aesthetics. Men tend to be more dominant, assertive, risk-prone, thrill-seeking, tough-minded, emotionally stable, utilitarian, and open to abstract ideas. In short, men prefer working mostly alone with things and abstract ideas (like computer programming) and women prefer working in groups with verbal communication (like administrative assistants). These sex differences appear starting at early childhood and are firmly entrenched by teen years. See blogs.scientificamerican.com/beautiful-minds/taking-sex-differences-in-personality-seriously/ for a nice summary of the research.

Demo code below. Replace “lt” (less-than) etc. with Boolean operator symbols. The training and test data are at https://jamesmccaffreyblog.com/2022/09/23/binary-classification-using-pytorch-1-12-1-on-windows-10-11/.

# people_gender_evo_hyperparams.py
# binary classification, evolutionary hyperparam search
# PyTorch 2.0.0-CPU Anaconda3-2022.10  Python 3.9.13
# Windows 10/11 

import numpy as np
import torch as T
device = T.device('cpu')  # apply to Tensor or Module

class PeopleDataset(T.utils.data.Dataset):
  # sex age   state  income  politics
  #  0  0.27  0 1 0  0.7610  0 0 1
  #  1  0.19  0 0 1  0.6550  1 0 0
  # sex: 0 = male, 1 = female
  # state: michigan, nebraska, oklahoma
  # politics: conservative, moderate, liberal

  def __init__(self, src_file):
    all_data = np.loadtxt(src_file, usecols=range(0,9),
      delimiter="\t", comments="#", dtype=np.float32) 

    self.x_data = T.tensor(all_data[:,1:9],
      dtype=T.float32).to(device)
    self.y_data = T.tensor(all_data[:,0],
      dtype=T.float32).to(device)  # float32 required

    self.y_data = self.y_data.reshape(-1,1)  # 2-D required

  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    feats = self.x_data[idx,:]  # idx row, all 8 cols
    sex = self.y_data[idx,:]    # idx row, the only col
    return feats, sex  # as a Tuple

# -----------------------------------------------------------

class Net(T.nn.Module):
  def __init__(self, n_hid, activ):
    super(Net, self).__init__()
    self.hid1 = T.nn.Linear(8, n_hid, activ)  # like 8-(10-10)-1
    self.hid2 = T.nn.Linear(n_hid, n_hid)
    self.oupt = T.nn.Linear(n_hid, 1)

    if activ == 'tanh':
      self.activ = T.nn.Tanh()
    elif activ == 'relu':
      self.activ = T.nn.ReLU()

    T.nn.init.xavier_uniform_(self.hid1.weight) 
    T.nn.init.zeros_(self.hid1.bias)
    T.nn.init.xavier_uniform_(self.hid2.weight) 
    T.nn.init.zeros_(self.hid2.bias)
    T.nn.init.xavier_uniform_(self.oupt.weight) 
    T.nn.init.zeros_(self.oupt.bias)

  def forward(self, x):
    z = self.activ(self.hid1(x))
    z = self.activ(self.hid2(z))
    z = T.sigmoid(self.oupt(z))  # for BCELoss()
    return z

# -----------------------------------------------------------

def accuracy_q(model, ds):
  inpts = ds[:][0]      # all input rows
  targets = ds[:][1]    # all targets 0s and 1s
  with T.no_grad():
    oupts = model(inpts)       # all computed ouputs
  pred_y = oupts "gte" 0.5        # tensor of 0s and 1s
  num_correct = T.sum(targets==pred_y)
  acc = (num_correct.item() * 1.0 / len(ds))  # scalar
  return acc

# -----------------------------------------------------------

def metrics(model, ds, thresh=0.5):
  # note: N = total number of items = TP + FP + TN + FN
  # accuracy  = (TP + TN)  / N
  # precision = TP / (TP + FP)
  # recall    = TP / (TP + FN)
  # F1        = 2 / [(1 / precision) + (1 / recall)]

  tp = 0; tn = 0; fp = 0; fn = 0
  for i in range(len(ds)):
    inpts = ds[i][0]         # dictionary style
    target = ds[i][1]        # float32  [0.0] or [1.0]
    target = target.int()    # int 0 or 1
    with T.no_grad():
      p = model(inpts)       # between 0.0 and 1.0

    # should really avoid 'target == 1.0'
    if target == 1 and p "gte" thresh:    # TP
      tp += 1
    elif target == 1 and p "lt" thresh:   # FP
      fn += 1
    elif target == 0 and p "lt" thresh:   # TN
      tn += 1
    elif target == 0 and p "gte" thresh:  # FN
      fp += 1

  N = tp + fp + tn + fn
  if N != len(ds):
    print("FATAL LOGIC ERROR in metrics()")

  accuracy = (tp + tn) / (N * 1.0)
  precision = (1.0 * tp) / (tp + fp)  # tp + fp != 0
  recall = (1.0 * tp) / (tp + fn)     # tp + fn != 0
  f1 = 2.0 / ((1.0 / precision) + (1.0 / recall))
  return (accuracy, precision, recall, f1)  # as a Tuple

# -----------------------------------------------------------

def train(net, ds, bs, lr, me, opt='sgd', verbose=False):
  # dataset, bat_size, lrn_rate, max_epochs, optimizer
  v = verbose
  train_ldr = T.utils.data.DataLoader(ds, batch_size=bs,
    shuffle=True)
  loss_func = T.nn.BCELoss()  # sigmoid activation
  if opt == 'sgd':
    optimizer = T.optim.SGD(net.parameters(), lr=lr)
  elif opt == 'adam':
    optimizer = T.optim.Adam(net.parameters(), lr=lr)  

  if v: print("\nStarting training ")
  le = me // 4  # log interval: n log prints
  for epoch in range(0, me):
    epoch_loss = 0.0  # for one full epoch
    for (batch_idx, batch) in enumerate(train_ldr):
      X = batch[0]  # inputs
      y = batch[1]  # correct class/label/politics

      optimizer.zero_grad()
      oupt = net(X)
      loss_val = loss_func(oupt, y)  # a tensor
      epoch_loss += loss_val.item()  # accumulate
      loss_val.backward()
      optimizer.step()

    if v:
      if epoch % le == 0:
        print("epoch = %5d  |  loss = %10.4f" % \
          (epoch, epoch_loss)) 
  if v: print("Done ") 

# -----------------------------------------------------------

def evaluate(soln, trn_ds, tst_ds, verbose=False):
  # compute the meta error of a soln
  # [n_hid, activ, bs, lr, me, opt]
  #   [0]    [1]   [2] [3] [4] [5]
  v = verbose

  n_hids = [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
  activs = ['tanh', 'tanh','tanh','tanh','tanh',
    'relu', 'relu', 'relu', 'relu', 'relu']
  b_szs = [1, 2, 4, 6, 8, 10, 12, 14, 16, 20]
  rates = [0.001, 0.005, 0.008, 0.01, 0.02, 0.03, 0.05,
    0.08, 0.10, 0.12]
  max_eps = [50, 100, 200, 300, 400, 500, 600, 700, 800, 1000]
  opts = ['sgd', 'sgd', 'sgd', 'sgd', 'sgd',
    'adam', 'adam', 'adam', 'adam', 'adam']

  n_hid = n_hids[soln[0]]
  activ = activs[soln[1]]
  bs = b_szs[soln[2]]
  lr = rates[soln[3]]
  me = max_eps[soln[4]]
  opt = opts[soln[5]]

  T.manual_seed(1)  # prepare
  np.random.seed(1)

  net = Net(n_hid, activ).to(device)  # create NN

  net.train()
  train(net, trn_ds, bs, lr, me, opt, verbose)  # train NN

  net.eval()
  acc_train = accuracy_q(net, trn_ds)  # evaluate NN accuracy
  acc_test = accuracy_q(net, tst_ds) 
  acc_weighted = ((1 * acc_train) + (3 * acc_test)) / 4
  error = 1.0 - acc_weighted  # [0.0, 1.0]
  if v: print("train acc = %0.4f " % acc_train)
  if v: print("test_acc = %0.4f " % acc_test)
  return (acc_train, acc_test, error)

# -----------------------------------------------------------

def show_soln_to_hyperparams(soln):
  n_hids = [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
  activs = ['tanh', 'tanh','tanh','tanh','tanh',
    'relu', 'relu', 'relu', 'relu', 'relu']
  b_szs = [1, 2, 4, 6, 8, 10, 12, 14, 16, 20]
  rates = [0.001, 0.005, 0.008, 0.01, 0.02, 0.03, 0.05,
    0.08, 0.10, 0.12]
  max_eps = [50, 100, 200, 300, 400, 500, 600, 700, 800, 1000]
  opts = ['sgd', 'sgd', 'sgd', 'sgd', 'sgd',
    'adam', 'adam', 'adam', 'adam', 'adam']

  n_hid = n_hids[soln[0]]
  activ = activs[soln[1]]
  bs = b_szs[soln[2]]
  lr = rates[soln[3]]
  me = max_eps[soln[4]]
  opt = opts[soln[5]]

  print("num hidden nodes = " + str(n_hid))
  print("hidden activation = " + str(activ))
  print("batch size = " + str(bs))
  print("learn rate = %0.4f " % lr)
  print("max epochs = " + str(me))
  print("optimizer = " + str(opt))

# -----------------------------------------------------------

def make_random_soln(dim, rnd):
  soln = rnd.randint(low=0, high=10, size=dim, dtype=int)
  return soln

def mutate(child_soln, mutate_prob, rnd):
  for k in range(len(child_soln)):
    q = rnd.random()  # [0.0, 1.0] 
    if q "lt" mutate_prob:
      child_soln[k] = rnd.randint(0, 10, size=1, dtype=int)
  return

# -----------------------------------------------------------

def main():
  # 0. get started
  print("\nPeople gender evolutionary hyperparam search ")
  T.manual_seed(1)
  np.random.seed(1)
  rnd = np.random.RandomState(1)  # controls initial pop

  # 1. create Dataset objects
  print("\nCreating People train and test Datasets ")

  train_file = ".\\Data\\people_train.txt"
  test_file = ".\\Data\\people_test.txt"

  train_ds = PeopleDataset(train_file)  # 200 rows
  test_ds = PeopleDataset(test_file)    # 40 rows

  # 2. create pop. of possible solutions (hyperparams)
  pop_size = 10
  dim = 6  # number of hyperparameters to examine
  max_gen = 12
  used = {}  # set of hyperparams that have been evaluated

  print("\nCreating population of " + \
    str(pop_size) + " random possible solns ")
  pop = []  # list of tuples, tuple is (np arr, float)

  for i in range(pop_size):  # each set of hyperparams
    soln = make_random_soln(dim, rnd)
    soln_key = "".join(str(x) for x in soln)
    while soln_key in used:
      soln = make_random_soln(dim, rnd)
      soln_key = "".join(str(x) for x in soln)
    used[soln_key] = 1

    trn_acc, tst_acc, err = evaluate(soln, train_ds, test_ds,
      verbose=True)
    pop.append((soln, err))
  pop = sorted(pop, key=lambda tup:tup[1])  # low err to hi

  # 3. find best set of initial hyperparams
  best_soln = pop[0][0].copy()
  best_err = pop[0][1]

  print("\nBest initial soln: ")
  print(best_soln)
  print("Best initial weighted error = %0.4f " % best_err)
  print("Best initial weighted acc = %0.4f " % (1-best_err))

# -----------------------------------------------------------

  # 4. evolve
  print("\nBegin evolution ")
  for gen in range(max_gen):
    print("\ngeneration = " + str(gen))
    # 4a. pick two parents
    first = rnd.randint(0, pop_size // 2)  # good one
    second = rnd.randint(pop_size // 2, pop_size)  # weaker
    flip = rnd.randint(2)  # 0 or 1
    if flip == 0:
      parent_idxs = (first, second)
    else:
      parent_idxs = (second, first)

    # 4b. create child
    child_soln = np.zeros(dim, dtype=int)
    i = parent_idxs[0]; j = parent_idxs[1]
    parent1 = pop[i][0]
    parent2 = pop[j][0]
    for k in range(0, dim // 2):  # left half
      child_soln[k] = parent1[k]
    for k in range(dim // 2, dim):  # right half
      child_soln[k] = parent2[k]

    # 4c. mutate child, avoid duplicate
    mutate_prob = 0.5
    mutate(child_soln, mutate_prob, rnd)
    child_soln_key = "".join(str(x) for x in child_soln)
    while child_soln_key in used:
      mutate(child_soln, mutate_prob, rnd)
      child_soln_key = "".join(str(x) for x in child_soln)
    used[child_soln_key] = 1

    trn_acc, tst_acc, child_err = evaluate(child_soln,
      train_ds, test_ds, verbose=True)
    print(child_soln)
    print("child err = %0.4f " % child_err)

    # 4d. is child new best soln?
    if child_err "lt" best_err: 
      print("New best soln found at gen " + str(gen))
      best_soln = child_soln.copy()
      best_err = child_err
    else:
      # print("No improvement at gen " + str(gen))
      pass

    # 4e. replace weak pop soln with child
    idx = rnd.randint(pop_size // 2, pop_size)
    pop[idx] = (child_soln, child_err)  # Tuple

    # 4f. sort solns from best (low error) to worst
    pop = sorted(pop, key=lambda tup:tup[1]) 

  print("\nEnd evolution ")

# -----------------------------------------------------------

  # 5. show best hyperparameters found
  print("\nFinal best soln found: ")
  print(best_soln)
  print("\nFinal best hyperparameters found: ")
  show_soln_to_hyperparams(best_soln)

  print("\nFinal best error = %0.4f " % best_err)
  print("Final best weighted acc = %0.4f " % (1-best_err))

  train_acc, test_acc, _ = evaluate(best_soln, train_ds, test_ds)
  print("\nFinal best train acc = %0.4f " % train_acc)
  print("Final best test acc = %0.4f " % test_acc)

  print("\nEnd evolutionary parameter search ")

# -----------------------------------------------------------

  # 5. TODO: save model

if __name__== "__main__":
  main()