PyTorch Multi-Class Classification Using MSELoss and One-Hot Encoded Data

Until relatively recently, the traditional way to do multi-class classification with a neural network is to 1.) encode the data file labels-to-predict using one-hot encoding (like “0, 1, 0” or “1, 0, 0”), 2.) make a neural network with softmax activation on the output nodes, 3.) train using mean squared error.

But by far the most common way to do multi-class classification with a PyTorch network is to 1.) encode the data file labels-to-predict using ordinal encoding (like “0” or “1” or “2”), 2.) make a neural network with no activation on the output nodes, 3.) train using the special CrossEntropyLoss() function.

I recently explored creating a PyTorch multi-class classifier using the older traditional approach using mean squared error. While I was coding that experiment I realized that an efficient variation would be to use an ordinal encoded data file rather than a one-hot encoded file, then when reading the data into memory, programmatically convert the ordinal values to one-hot vectors.

To cut to the chase, the idea worked as expected.

Left: A program that uses MSELoss() with one-hot encoded data. Right: A program that uses MSELoss() with ordinal encoded data that is programmatically converted to one-hot vectors in memory. Both programs give identical results, which is what should happen.

I implemented the idea for the Iris data where there are four predictor variables and three species to predict. The key to the idea was implementing a PyTorch Dataset object to read the ordinal encoded data, and convert it to one-hot encoded tensor data:

class IrisDataset(T.utils.data.Dataset):
  def __init__(self, src_file, num_rows=None):
    # like 5.0, 3.5, 1.3, 0.3, 2
    # convert ordinal label to one-hot vectot
    tmp_x_data = np.loadtxt(src_file, max_rows=num_rows,
      usecols=range(0,4), delimiter=",", skiprows=0,
      dtype=np.float32)
    tmp_y_data = np.loadtxt(src_file, max_rows=num_rows,
      usecols=[4], delimiter=",", skiprows=0,
      dtype=np.int64)

    self.x_data = T.tensor(tmp_x_data,
      dtype=T.float32).to(device)

    n_rows = len(tmp_y_data)
    n_cols = 3
    dims = (n_rows, n_cols)
    self.y_data = T.zeros(dims,
      dtype=T.float32).to(device)
    for i in range(n_rows):
      j = tmp_y_data[i]  # the ord value 0, 1, or 2
      self.y_data[i][j] = 1.0

    self.num_rows=n_rows 

  def __len__(self):
    return self.num_rows

  def __getitem__(self, idx):
    if T.is_tensor(idx):
      idx = idx.tolist()
    
    preds = self.x_data[idx]
    spcs = self.y_data[idx]
    sample = { 'predictors' : preds, 'species' : spcs }

    return sample

There isn’t much code in the Dataset class but it’s surprisingly tricky in the details. The indirect moral of the story is this: Any successful project or company or effort needs people who see “the big picture” — the idea people. But success only happens with execution. Especially in tech companies, it’s easy to come up with a big idea, but it’s never easy to write the code to make that idea come to life.

Two somewhat enigmatic illustrations by Japanese artist Ichiro Tsuruta (1954-). He is known for “bijinga” (meaning beautiful female figure) works like these. It’s easy for someone to think of creating an illustration like any of these, but it’s another matter to execute. I don’t find these illustrations particularly appealing, but I have great respect for the effort required to create them.