UCI Digits Image Classification Using a PyTorch CNN

One of my standard neural network examples is image classification on the MNIST dataset. The full MNIST (modified National Institute of Standards and Technology) dataset has 60,000 images for training and 10,000 images for testing.

The UCI Digits dataset is similar to MNIST but smaller and easier to experiment with. Each UCI Digits image is an 8 x 8 (64 pixels) grayscale handwritten digit from ‘0’ to ‘9’. Each pixel value is an integer from 0 (white) to 16 (black).

Example UCI Digits images

The UCI Digits dataset can be found at archive.ics.uci.edu/ml/datasets/optical+recognition+of+handwritten+digits. The 3823-item training file is named optdigits.tra and the 1797-item test file is named optdigits.tes. The files are text files so I renamed them and added a “.txt” extensions. Each line has 65 comma-delimited values. The first 64 values are the pixels (0 to 16) and the last value on each line is the digit (0 to 9).

I created a CNN system using PyTorch. The code that defines my network is:

class CNN_Net(T.nn.Module):
  def __init__(self):
    super(CNN_Net, self).__init__()  # pre Python 3.3 syntax
    self.conv1 = T.nn.Conv2d(1, 16, 2)  # chnl-in, out, krnl
    self.conv2 = T.nn.Conv2d(16, 24, 2)

    self.fc1 = T.nn.Linear(96, 64)   # [24*2*2, x]
    self.fc2 = T.nn.Linear(64, 10)   # 10 output vals

    self.pool1 = T.nn.MaxPool2d(2, 2)   # kernel, stride
    self.drop1 = T.nn.Dropout(0.10)
    self.drop2 = T.nn.Dropout(0.15)

    # default weight and bias initialization
    # therefore order of defintion maters
  
  def forward(self, x):
    # input x is Size([bs, 64])
    z = T.relu(self.conv1(x))     # Size([bs, 16, 7, 7])
    z = self.pool1(z)             # Size([bs, 16, 3, 3])
    z = self.drop1(z)             # Size([bs, 16, 3, 3])
    z = T.relu(self.conv2(z))     # Size([bs, 24, 2, 2])
   
    z = z.reshape(-1, 96)         # Size([bs, 96)]
    z = T.relu(self.fc1(z))       # Size([bs, 64)]
    z = self.drop2(z)             # Size([bs, 64])
    z = T.log_softmax(self.fc2(z), dim=1)  # for NLLLoss()
    return z                      # Size([bs, 10])

The code is paradoxically simple and incredibly complex. The code is simple if you have implemented CNNs before because the parts — convolution layers, linear layers, pooling layers, dropout layers — are standard building blocks. However, there are literally an infinite number of ways to compose the building blocks, and furthermore, each building block has many optional parameters.

My demo code worked reasonably well and scored 99.35% accuracy on the training data (3798 of 3823 correct) and 96.22% accuracy on the test data (1729 of 1797 correct).

My motivation for creating the CNN system is related to a project I’m working on. I created a transformer-based autoencoder anomaly detection system. I used the UCI Digits dataset. To test the anomaly detection system I want to create some adversarial input data items using the Fast Gradient Sign Method (FGSM) technique, and then see if the anomaly detection system can find them when mixed with ordinary benign data items. To create FGSM items, I need access to a UCI Digits model.

Good brain exercise.

Brain Boy was a comic book series. There were only six issues, published in 1962-1963. These are covers of the first three issues.

Brain Boy was Matt Price. When his mother was pregnant, a car accident with an electrical tower killed his father and gave Matt mental powers and levitation. When he became an adult he was recruited as a government agent but he was still called “Brain Boy”, his childhood nickname.

Demo code. Replace “lt”, “gt”, “lte”, “gte” with Boolean operator symbols.

# uci_digits_cnn.py

# UCI Digits classification using a CNN
# note: intent is to use this as a basis for FGSM evil data
#  then use FGSM data to test TA anomaly detection

# PyTorch 1.10.0-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10/11 

import numpy as np
import matplotlib.pyplot as plt
import torch as T

device = T.device('cpu') 

# -----------------------------------------------------------

class UCI_Digits_Dataset(T.utils.data.Dataset):
  # like 8,12,0,16, . . 15,7
  # 64 pixel values [0-16], label/digit [0-9]

  def __init__(self, src_file):
    tmp_xy = np.loadtxt(src_file, usecols=range(0,65),
      delimiter=",", comments="#", dtype=np.float32)
    tmp_x = tmp_xy[:,0:64]
    tmp_x /= 16.0  # normalize pixels to [0.0, 1.0]
    tmp_x = tmp_x.reshape(-1, 1, 8, 8)  # bs, chnls, 8x8
    tmp_y = tmp_xy[:,64]  # float32 form, must convert to int

    self.x_data = T.tensor(tmp_x, dtype=T.float32).to(device)
    self.y_data = T.tensor(tmp_y, dtype=T.int64).to(device)

  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    pixels = self.x_data[idx]
    label = self.y_data[idx]
    return (pixels, label)  # as a tuple

# -----------------------------------------------------------

class CNN_Net(T.nn.Module):
  def __init__(self):
    super(CNN_Net, self).__init__()  # pre Python 3.3 syntax
    self.conv1 = T.nn.Conv2d(1, 16, 2)  # chnl-in, out, krnl
    self.conv2 = T.nn.Conv2d(16, 24, 2)

    self.fc1 = T.nn.Linear(96, 64)   # [24*2*2, x]
    self.fc2 = T.nn.Linear(64, 10)   # 10 output vals

    self.pool1 = T.nn.MaxPool2d(2, 2)   # kernel, stride
    self.drop1 = T.nn.Dropout(0.10)
    self.drop2 = T.nn.Dropout(0.15)

    # default weight and bias initialization
    # therefore order of defintion maters
  
  def forward(self, x):
    # input x is Size([bs, 64])
    z = T.relu(self.conv1(x))     # Size([bs, 16, 7, 7])
    z = self.pool1(z)             # Size([bs, 16, 3, 3])
    z = self.drop1(z)             # Size([bs, 16, 3, 3])
    z = T.relu(self.conv2(z))     # Size([bs, 24, 2, 2])
   
    z = z.reshape(-1, 96)         # Size([bs, 96)]
    z = T.relu(self.fc1(z))
    z = self.drop2(z)
    z = T.log_softmax(self.fc2(z), dim=1)  # for NLLLoss()
    return z

# -----------------------------------------------------------

def accuracy(model, ds):
  ldr = T.utils.data.DataLoader(ds,
    batch_size=len(ds), shuffle=False)
  n_correct = 0
  for data in ldr:
    (pixels, labels) = data
    with T.no_grad():
      oupts = model(pixels)
    (_, predicteds) = T.max(oupts, 1)
    n_correct += (predicteds == labels).sum().item()

  acc = (n_correct * 1.0) / len(ds)
  return acc

# -----------------------------------------------------------

def display_digit(ds, idx):
  # ds is a PyTorch Dataset
  data = ds[idx][0]  # [0] is the pixels, [1] is the label
  pixels = np.array(data)  # tensor to numpy
  pixels = pixels.reshape((8,8))
  for i in range(8):
    for j in range(8):
      pxl = pixels[i,j]  # or [i][j] syntax
      # print("%.2X" % pxl, end="")  # hexidecimal
      print("%3d" % pxl, end="")
    print("")

  plt.imshow(pixels, cmap=plt.get_cmap('gray_r'))
  plt.show() 
  plt.close() 

# -----------------------------------------------------------

def main():
  # 0. get started
  print("\nBegin UCI Digits CNN classification demo ")
  T.manual_seed(1)
  np.random.seed(1)

  # 1. create Dataset object
  print("\nLoading UCI digits data ")
  # train_data = ".\\Data\\uci_digits_train_100.txt"
  train_data = ".\\Data\\optdigits_train_3823.txt"
  train_ds = UCI_Digits_Dataset(train_data)
  bat_size = 4
  train_ldr = T.utils.data.DataLoader(train_ds,
    batch_size=bat_size, shuffle=True)

# -----------------------------------------------------------

  # 2. create network
  print("\nCreating CNN classifier ")
  net = CNN_Net().to(device)
  net.train()  # set mode

# -----------------------------------------------------------

  # 3. train 
  loss_func = T.nn.NLLLoss()  # log_softmax output
  lrn_rate = 0.01
  opt = T.optim.SGD(net.parameters(), lr=lrn_rate)
  max_epochs = 50
  log_every = 10

  print("\nStarting training ")
  for epoch in range(max_epochs):
    epoch_loss = 0.0
    for bix, batch in enumerate(train_ldr):
      X = batch[0]  # 64 normalized input pixels
      Y = batch[1]  # the class label

      opt.zero_grad()
      oupt = net(X)
      loss_val = loss_func(oupt, Y)  # a tensor
      epoch_loss += loss_val.item()  # for progress display
      loss_val.backward()            # compute gradients
      opt.step()                     # update weights

    if epoch % log_every == 0:
      print("epoch = %4d   loss = %0.4f" % (epoch, epoch_loss))

  print("Done ")

# -----------------------------------------------------------

  # 4. evaluate model accuracy
  print("\nComputing model accuracy")
  net.eval()
  acc_train = accuracy(net, train_ds)  # all at once
  print("Accuracy on training data = %0.4f" % acc_train)

  test_file = ".\\Data\\digits_uci_test_1797.txt"
  test_ds = UCI_Digits_Dataset(test_file)
  net.eval()
  acc_test = accuracy(net, test_ds)  # all at once
  print("Accuracy on test data = %0.4f" % acc_test)

# -----------------------------------------------------------

  # 5. save model
  # TODO

# -----------------------------------------------------------

  # 6. use model
  print("\nPredicting for 64 random pixel values ")
  x = np.random.random(64)   # in [0.0, 1.0]

  x = x.reshape(8,8)
  plt.tight_layout()
  plt.imshow(x, cmap=plt.get_cmap('gray_r'))
  plt.show()

  x = x.reshape(1, 1, 8, 8)  # make it a batch
  x = T.tensor(x, dtype=T.float32).to(device)  # to tensor
  with T.no_grad():
    oupt = net(x)  # 10 log-softmax tensor logits
  print("\nRaw output logits: ")
  print(oupt)

  oupt = oupt.numpy()  # convert to numpy array
  probs = np.exp(oupt)  # pseudo-probs
  np.set_printoptions(precision=4, suppress=True)
  print("\nOutput pseudo-probabilities: ")
  print(probs)

  pred_class = np.argmax(probs)
  print("\nPredicted class/digit: ")
  print(pred_class)
  
  print("\nEnd UCI Digits demo ")

# -----------------------------------------------------------

if __name__ == "__main__":
  main()