PyTorch Dataset Transforms Aren't Useful for Non-Image Data

I’ve been doing a deep dive into the PyTorch Dataset and DataLoader objects, peeling away layer after layer of details. I am now at an extremely low level, so it’s very difficult to explain the topic of this blog post. I’ll try my best, but be aware the topic is complicated.

To train a PyTorch neural network, you must fetch training data, partition it into batches, possibly normalize and encode it if the data is in raw form, and feed the batches of data to the neural network. Although simple in principle, there are a very large number of tricky details to attend to.

Over the past 18 months or so, for PyTorch neural networks, it has become standard technique to use the Dataset and DataLoader classes which are contained in the torch.utils.data module. A Dataset class is really just a skeleton interface and you must supply custom code to read data into memory and to fetch a single item for use by DataLoader. (You ususally don’t have to customize a DataLoader object).

At some point you must convert training data from NumPy form to PyTotch tensor form. This is a quite complicated topic in itself. Hold on, I’m getting close to the topic of this blog post, but not quite there yet.

The PyTorch “torchvision” package has many classes and functions that are useful for processing image data such as the MNIST handwritten digits dataset or the CIFAR-10 general images dataset. The torchvision package has a Transform class that can apply preprocessing to an image before it gets fed to the neural network being trained. There are roughly 40 built-in transforms such as Crop, GrayScale, Resize, and Scale. And you can define your own custom transforms too. And there is also a Compose method that allows you to chain transforms together.

These image transforms are all a type of data normalization. It makes sense to have a library of such normalizing transforms because the images in an image dataset are often different shapes, sizes, color scales, and so on. And coding image transforms from scratch is slightly tricky.

So, I was wondering if it makes sense to use the torchvision transform mechanism on non-image data. For example, using the mechanism on something like the Iris Dataset where the goal is to classify a flower as one of three species of iris (“setosa”, “versicolor” or “virginica”) based on four numeric predictors (petal length and width, sepal length and width).

I experimented for a couple of days and came to the conclusion that even though the torchvision transform mechanism works for non-image data, the increased complexity makes the technique not really worth the effort. Briefly, if you want to transform data, there are two scenarios. Some transforms, such as converting to a tensor or dividing a numeric predictor by a constant to scale it, are easy and so there’s no need for the transform mechanism. Other transforms are quite tricky, such as an adaptive normalization where data is normalized differently at different times during training. In these situations, the torch vision mechanism makes a complex programming task even more complex without much benefit.

Now all of this is just my opinion and another programmer might find the torchvision transform mechanism applied to non-image data worth the effort. To him I say, more power to you. If there was just one way to create a neural network, programming wouldn’t be as interesting as it is.

I love tiki-themed bars. My favorite drink is the Hawaiian-style mai tai (fruity with a dash of red grenadine syrup). Many tiki bars are dark and so you need a tiki torch to see – a different kind of torch-vision.

Data:

1   0   0.057143   0   1   0   1.000000   1
0   1   1.000000   0   0   1   0.016598   2
1   0   0.257143   0   1   0   0.329876   0
1   0   0.200000   1   0   0   0.458506   1
1   0   0.428571   0   0   1   0.000000   2
0   1   0.485714   0   1   0   0.371369   0
0   1   0.085714   0   1   0   0.188797   1
1   0   0.171429   1   0   0   0.802905   0

# pytorch_dataset_demo.py
# PyTorch 1.5.0-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10 

# demonstrates Transforms not really useful for non-image data

import numpy as np
import torch as T
import torchvision.transforms as tvt

device = T.device("cpu")  # apply to Tensor or Module

# -----------------------------------------------------------

# predictors and label in same file
# data has been normalized and encoded
#   1  0  0.057143  0  1  0  1.000000  2

class PeopleDataset(T.utils.data.Dataset):

  def __init__(self, src_file, num_rows=None, transform=None):
    x_data = np.loadtxt(src_file, max_rows=num_rows,
      usecols=range(0,7), delimiter="\t", skiprows=0,
      dtype=np.float32)
    y_data = np.loadtxt(src_file, max_rows=num_rows,
      usecols=7, delimiter="\t", skiprows=0, dtype=np.long)

    # self.x_data = T.tensor(x_data, dtype=T.float32).to(device)
    # self.y_data = T.tensor(y_data, dtype=T.long).to(device)

    self.x_data = transform(x_data)  # to_tensor() + divide by 2
    self.y_data = transform(y_data)  # oops . . .

    self.transform = transform  # if want to use in getitem()
    
    self.num_rows=num_rows  # not essential

  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    if T.is_tensor(idx):
      idx = idx.tolist()
    preds = self.x_data[idx, 0:7]
    pol = self.y_data[idx]
    sample = { 'predictors' : preds, 'political' : pol }

    return sample

# define Transforms

# function + Lambda technique.
def to_tensor_transform(np_data):
  return T.from_numpy(np_data).to(device)

# callable class technique
class ToTensor(object):
  def __call__(self, np_data):
    return T.from_numpy(np_data).to(device)

def normalize_transform(tensor_data):
  return (tensor_data / 2.0)

# -----------------------------------------------------------

def main():
  print("\nBegin PyTorch Dataset and Dataloader demo ")

  # 0. prepare
  T.manual_seed(1)
  np.random.seed(1)

  # 1. create DataLoader object
  print("\nCreating People train and DataLoader ")

  train_file = ".\\people_train.txt"
  # my_transform 
  #  = tvt.Compose([tvt.Lambda(lambda x: to_tensor_transform(x)),
  #             tvt.Lambda(lambda x: normalize_transform(x))])

  my_transform = tvt.Compose([ToTensor(),
    tvt.Lambda(lambda x: normalize_transform(x))])

  train_ds = PeopleDataset(train_file, num_rows=8, 
    transform=my_transform)

  bat_size = 3
  train_ldr = T.utils.data.DataLoader(train_ds,
    batch_size=bat_size, shuffle=True)

  # 2. iterate thru training data twice
  for epoch in range(2):
    print("\n==============================\n")
    print("Epoch = " + str(epoch))
    for (batch_idx, batch) in enumerate(train_ldr):
      print("\nBatch = " + str(batch_idx))
      X = batch['predictors']  # [3,7]
      # Y = T.flatten(batch['political'])  # 
      Y = batch['political']
      print(X)
      print(Y)
  print("\n==============================")

  print("\nEnd demo ")

if __name__ == "__main__":
  main()