Binary Classification (People Gender) Using a PyTorch Neural Network with a Transformer Component

A binary classification problem is one where the goal is to predict a discrete value where there are only two possibilities. For example, you might want to predict the sex of a person based on their age, income, and so on. I’ve been looking at incorporating a Transformer component into a PyTorch neural network to see if the technique works or not.

Bottom line: I got a binary classification demo up and running but because the system is so complicated, it’s not clear to me if a binary classifier network with a Transformer component is better than, worse then, or roughly equivalent to, a standard deep neural network with a Transformer component.

The training of the demo system showed an unusual pattern where nothing happened for the first 2000 epochs but then the loss value dropped quickly.

I used one of my standard datasets for binary classification. The data looks like:

 1   0.24   1 0 0   0.2950   0 0 1
 0   0.39   0 0 1   0.5120   0 1 0
 1   0.63   0 1 0   0.7580   1 0 0
 0   0.36   1 0 0   0.4450   0 1 0
. . .

Each line of data represents a person. The fields are sex (male = 0, female = 1), age (normalized by dividing by 100), State (Michigan = 100, Nebraska = 010, Oklahoma = 001), annual income (divided by 100,000), and politics type (conservative = 100, moderate = 010, liberal = 001). The goal is to predict the gender of a person from their age, state, income, and politics type. There are 200 training items and 40 test items.

My demo network used an (8-32)-T-10-1 architecture. There are 8 input nodes that are mapped to 4 nodes each using a custom numeric embedding layer. Those are fed to a TransformerEncoder, then a fully connected hidden layer with 10 nodes, then a single output node. The output node value will be between 0.0 and 1.0 and a values less than 0.5 means class 0 = male, and a value greater than 0.5 means class 1 = female.

I implemented a program-defined metrics() function that computes accuracy, precision, recall, and F1 score. After training, the model scored 85.50% accuracy on the training data (171 out of 200 correct), and 80.00% accuracy on the test data (32 out of 40 correct).

It’s not possible to draw any strong conclusions because the dataset is so small and the neural architecture has many hyperparameters that I didn’t experiemnt with. In addition to all the usual neural parameters, a TransfomerEncoder has an embedding size, number of attention heads, size of the internal hidden layer, number of encoding layers, and positional encoding dropout rate.

It was a very interesting challenge.

I’m not a fan of most Japanese sci-fi monster movies, with the exception of Godzilla (1954) and Rodan (1956) — two excellent movies. In Japanese sci-fi, one form of gender classification is pretty easy: women aliens are usually evil.

Top Left: In “Destroy All Monsters” (1968), the alien Kilaaks use mind control to get Godzilla, Rodan (a giant pterodactyl), Mothra (a giant moth/larva), and a few other monsters to attack Earth. The monsters break free of the control and help Earth defeat the Kilaaks.

Top Right: In “Invasion of Astro-Monster” aka “Monster Zero” (1965), the alien Xiliens ask Earth if they can borrow Godzilla and Rodan to defeat Ghidorah (giant three-headed dragon) who is ravaging their home planet. But the Xiliens then try to use all three monsters to conquer Earth. Earth’s technology prevails.

Bottom Left: In “Gamera vs. Guiron” (1969), the alien Terrans have an appetite for human brains. Yuck. The Earth’s Gamera (giant turtle monster) defeats the Terran’s Guiron (a monster that has a knife-shaped head).

Bottom Right: In “Godzilla vs. Megalon” (1973), an alien race, the Seatopians, have been living undiscovered under the sea. I won’t even try to summarize the incomprehensible plot, but one highlight of the movie was a Seatopian ritual dance complete with clear plastic outfits, pointy hats, and white go-go boots.

Demo code. Replace “lt” (less-than), “gt”, “lte”, “gte” with Boolean operator symbols. The training and test data is at https://jamesmccaffreyblog.com/2022/09/23/binary-classification-using-pytorch-1-12-1-on-windows-10-11/.

# people_gender_transformer.py
# binary classification using a TransformerEncoder
# PyTorch 2.0.0-CPU Anaconda3-2022.10  Python 3.9.13
# Windows 10/11 

import numpy as np
import torch as T
device = T.device('cpu')  # apply to Tensor or Module
T.set_num_threads(1)

class PeopleDataset(T.utils.data.Dataset):
  # sex age   state  income  politics
  #  0  0.27  0 1 0  0.7610  0 0 1
  #  1  0.19  0 0 1  0.6550  1 0 0
  # sex: 0 = male, 1 = female
  # state: michigan, nebraska, oklahoma
  # politics: conservative, moderate, liberal

  def __init__(self, src_file):
    all_data = np.loadtxt(src_file, usecols=range(0,9),
      delimiter="\t", comments="#", dtype=np.float32) 

    self.x_data = T.tensor(all_data[:,1:9],
      dtype=T.float32).to(device)
    self.y_data = T.tensor(all_data[:,0],
      dtype=T.float32).to(device)  # float32 required

    self.y_data = self.y_data.reshape(-1,1)  # 2-D required

  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    feats = self.x_data[idx,:]  # idx row, all 8 cols
    sex = self.y_data[idx,:]    # idx row, the only col
    return (feats, sex)  # as a Tuple

# -----------------------------------------------------------

class SkipLinear(T.nn.Module):  # numeric embedding layer

  # -----

  class Core(T.nn.Module):
    def __init__(self, n):
      super().__init__()
      # 1 node to n nodes, n gte 2
      self.weights = T.nn.Parameter(T.zeros((n,1),
        dtype=T.float32))
      self.biases = T.nn.Parameter(T.tensor(n,
        dtype=T.float32))
      lim = 0.01
      T.nn.init.uniform_(self.weights, -lim, lim)
      T.nn.init.zeros_(self.biases)

    def forward(self, x):
      wx= T.mm(x, self.weights.t())
      v = T.add(wx, self.biases)
      return v

  # -----

  def __init__(self, n_in, n_out):
    super().__init__()
    self.n_in = n_in; self.n_out = n_out
    if n_out  % n_in != 0:
      print("FATAL: n_out must be divisible by n_in")
    n = n_out // n_in  # num nodes per input

    self.lst_modules = \
      T.nn.ModuleList([SkipLinear.Core(n) for \
        i in range(n_in)])

  def forward(self, x):
    lst_nodes = []
    for i in range(self.n_in):
      xi = x[:,i].reshape(-1,1)
      oupt = self.lst_modules[i](xi)
      lst_nodes.append(oupt)
    result = T.cat((lst_nodes[0], lst_nodes[1]), 1)
    for i in range(2,self.n_in):
      result = T.cat((result, lst_nodes[i]), 1)
    result = result.reshape(-1, self.n_out)
    return result

# -----------------------------------------------------------

class TransformerNet(T.nn.Module):
  def __init__(self):
    super(TransformerNet, self).__init__()  # old syntax

    # numeric pseudo-embedding, dim=4
    self.embed = SkipLinear(8, 32)  # 8 inputs, each goes to 4 
    self.pos_enc = \
      PositionalEncoding(4, dropout=0.00)  # positional
    self.enc_layer = T.nn.TransformerEncoderLayer(d_model=4,
      nhead=2, dim_feedforward=10, 
      batch_first=True)  # d_model divisible by nhead
    self.trans_enc = T.nn.TransformerEncoder(self.enc_layer,
      num_layers=2)  # 6 layers default
    self.fc1 = T.nn.Linear(32, 10)  # 10 hidden nodes
    self.fc2 = T.nn.Linear(10, 1)
 
  def forward(self, x):
    # x = 8 inputs, fixed length
    z = self.embed(x)  # 8 inpts to 32 embed 
    z = z.reshape(-1, 8, 4)  # bat seq embed 
    z = self.pos_enc(z) 
    z = self.trans_enc(z) 
    z = z.reshape(-1, 32)  # torch.Size([bs, xxx])
    z = T.tanh(self.fc1(z))
    z = T.sigmoid(self.fc2(z))  # for BCELoss()
    return z

# -----------------------------------------------------------

class PositionalEncoding(T.nn.Module):  # documentation code
  def __init__(self, d_model: int, dropout: float=0.0,
   max_len: int=5000):
    super(PositionalEncoding, self).__init__()  # old syntax
    self.dropout = T.nn.Dropout(p=dropout)
    pe = T.zeros(max_len, d_model)  # like 10x4
    position = \
      T.arange(0, max_len, dtype=T.float).unsqueeze(1)
    div_term = T.exp(T.arange(0, d_model, 2).float() * \
      (-np.log(10_000.0) / d_model))
    pe[:, 0::2] = T.sin(position * div_term)
    pe[:, 1::2] = T.cos(position * div_term)
    pe = pe.unsqueeze(0).transpose(0, 1)
    self.register_buffer('pe', pe)  # allows state-save

  def forward(self, x):
    x = x + self.pe[:x.size(0), :]
    return self.dropout(x)

# -----------------------------------------------------------

def metrics(model, ds, thresh=0.5):
  # note: N = total number of items = TP + FP + TN + FN
  # accuracy  = (TP + TN)  / N
  # precision = TP / (TP + FP)
  # recall    = TP / (TP + FN)
  # F1        = 2 / [(1 / precision) + (1 / recall)]

  tp = 0; tn = 0; fp = 0; fn = 0
  for i in range(len(ds)):
    inpts = ds[i][0].reshape(1,-1)  # make it a batch
    target = ds[i][1].reshape(1)  # float32  [0.0] or [1.0]
    target = target.long()    # int 0 or 1
    with T.no_grad():
      p = model(inpts)       # between 0.0 and 1.0

    # should really avoid 'target == 1.0'
    if target == 1 and p "gte" thresh:    # TP
      tp += 1
    elif target == 1 and p "lt" thresh:   # FN
      fn += 1
    elif target == 0 and p "lt" thresh:   # TN
      tn += 1
    elif target == 0 and p "gte" thresh:  # FP
      fp += 1

  N = tp + fp + tn + fn
  if N != len(ds):
    print("FATAL LOGIC ERROR in metrics()")

  accuracy = (tp + tn) / (N * 1.0)
  precision = (1.0 * tp) / (tp + fp)  # tp + fp != 0
  recall = (1.0 * tp) / (tp + fn)     # tp + fn != 0
  f1 = 2.0 / ((1.0 / precision) + (1.0 / recall))
  return (accuracy, precision, recall, f1)  # as a Tuple

# -----------------------------------------------------------

def main():
  # 0. get started
  print("\nPeople gender using PyTorch TransformerEncoder")
  T.manual_seed(1)
  np.random.seed(1)

  # 1. create Dataset and DataLoader objects
  print("\nCreating People train and test Datasets ")

  train_file = ".\\Data\\people_train.txt"
  test_file = ".\\Data\\people_test.txt"

  train_ds = PeopleDataset(train_file)  # 200 rows
  test_ds = PeopleDataset(test_file)    # 40 rows

  bat_size = 10
  train_ldr = T.utils.data.DataLoader(train_ds,
    batch_size=bat_size, shuffle=True)

  # 2. create neural network
  print("\nCreating (8--32)-T-10-1 classifier ")
  net = TransformerNet().to(device)
  net.train()  # set training mode

  # 3. train network
  lrn_rate = 0.05
  loss_func = T.nn.BCELoss()  # binary cross entropy
  # loss_func = T.nn.MSELoss()
  optimizer = T.optim.SGD(net.parameters(), lr=lrn_rate)
  # optimizer = T.optim.Adam(net.parameters(), lr=lrn_rate)
  max_epochs = 2500
  ep_log_interval = 200

  print("\nLoss function: " + str(loss_func))
  print("Optimizer: " + str(optimizer.__class__.__name__))
  print("Learn rate: " + "%0.3f" % lrn_rate)
  print("Batch size: " + str(bat_size))
  print("Max epochs: " + str(max_epochs))

  print("\nStarting training")
  for epoch in range(0, max_epochs):
    epoch_loss = 0.0            # for one full epoch
    for (batch_idx, batch) in enumerate(train_ldr):
      X = batch[0]             # [bs,8]  inputs
      Y = batch[1]             # [bs,1]  targets
      oupt = net(X)            # [bs,1]  computeds 

      loss_val = loss_func(oupt, Y)   # a tensor
      epoch_loss += loss_val.item()  # accumulate
      optimizer.zero_grad() # reset all gradients
      loss_val.backward()   # compute new gradients
      optimizer.step()      # update all weights

    if epoch % ep_log_interval == 0:
      print("epoch = %4d   loss = %8.4f" % \
        (epoch, epoch_loss))
  
  print("Done ")

# -----------------------------------------------------------

  # 4. evaluate model
  net.eval()
  metrics_train = metrics(net, train_ds, thresh=0.5)
  print("\nMetrics for train data: ")
  print("accuracy  = %0.4f " % metrics_train[0])
  print("precision = %0.4f " % metrics_train[1])
  print("recall    = %0.4f " % metrics_train[2])
  print("F1        = %0.4f " % metrics_train[3])

  metrics_test = metrics(net, test_ds, thresh=0.5)
  print("\nMetrics for test data: ")
  print("accuracy  = %0.4f " % metrics_test[0])
  print("precision = %0.4f " % metrics_test[1])
  print("recall    = %0.4f " % metrics_test[2])
  print("F1        = %0.4f " % metrics_test[3])

  # 5. save model
  print("\nSaving trained model state_dict ")
  net.eval()
  # path = ".\\Models\\people_gender_model.pt"
  # T.save(net.state_dict(), path)

  # 6. make a prediction 
  print("\nSetting age = 30  Oklahoma  $40,000  moderate ")
  X = np.array([[0.30, 0,0,1, 0.4000, 0,1,0]],
    dtype=np.float32)
  X = T.tensor(X, dtype=T.float32).to(device)

  net.eval()
  with T.no_grad():
    oupt = net(X)    # a Tensor
  pred_prob = oupt.item()  # scalar, [0.0, 1.0]
  print("Computed output: ", end="")
  print("%0.4f" % pred_prob)

  if pred_prob "lt" 0.5:
    print("Prediction = male")
  else:
    print("Prediction = female")

  print("\nEnd binary demo ")

if __name__== "__main__":
  main()