Working With MNIST Data

Most of the PyTorch and Keras examples of MNIST image classification that I find on the Internet use built-in MNIST datasets. This is convenient but it hides a lot of important ideas. And for real image classification problems, you don’t get magic datasets handed to you.

I wrote a script to convert raw MNIST data to text files, and a script to display a specified MNIST digit image from the text files. First I downloaded the MNIST four zipped binary proprietary format files to my machine. I unzipped the four files and added a “bin” extension. The following script creates files where each line is one 28×28 image. The first 784 values (between 0 and 255) are the pixels, and the last value is the label (0 to 9).

# converter_mnist.py

# convert MNIST binary to text file; combine pixels and labels
# target format:
# pixel_1 (tab) pixel_2 (tab) . . pixel_784 (tab) digit

# 1. manually download four zipped-binary files from
#    yann.lecun.com/exdb/mnist/ 
# 2. use 7-Zip to unzip files, add ".bin" extension
# 3. determine format you want and modify script

def convert(img_file, label_file, txt_file, n_images):
  lbl_f = open(label_file, "rb")   # MNIST has labels (digits)
  img_f = open(img_file, "rb")     # and pixel vals separate
  txt_f = open(txt_file, "w")      # output file to write to

  img_f.read(16)   # discard header info
  lbl_f.read(8)    # discard header info

  for i in range(n_images):   # number images requested 
    lbl = ord(lbl_f.read(1))  # get label (unicode, one byte) 
    for j in range(784):  # get 784 vals from the image file
      val = ord(img_f.read(1))
      txt_f.write(str(val) + "\t") 
    txt_f.write(str(lbl) + "\n")
  img_f.close(); txt_f.close(); lbl_f.close()

def main():
  convert(".\\UnzippedBinary\\train-images.idx3-ubyte.bin",
          ".\\UnzippedBinary\\train-labels.idx1-ubyte.bin",
          "mnist_train_1000.txt", 1000)

  convert(".\\UnzippedBinary\\t10k-images.idx3-ubyte.bin",
          ".\\UnzippedBinary\\t10k-labels.idx1-ubyte.bin",
          "mnist_test_100.txt", 100)

  # f = open(".\\mnist_train_1000.txt", "r")  # show raw values
  # for line in f:
  #   print(line)
  #   input()
  # f.close()

if __name__ == "__main__":
  main()

And here’s a script that will display an MNIST image in human-friendly form:

# show_image.py

import numpy as np
import matplotlib.pyplot as plt

# assumes MNIST data line has 784 pixels then label.
# tab-delimited. pixel vals 0 to 255. label 0 to 9.

# ========================================================

def display_from_file(txt_file, idx):
  all_data = np.loadtxt(txt_file, delimiter="\t",
    usecols=range(0,785), dtype=np.int64)

  x_data = all_data[:,0:784]  # all rows, 784 cols
  y_data = all_data[:,784]    # all rows, last col

  label = y_data[idx]
  print("digit = ", str(label), "\n")

  pixels = x_data[idx]
  pixels = pixels.reshape((28,28))
  for i in range(28):
    for j in range(28):
      print("%.2X" % pixels[i,j], end="")
      print(" ", end="")
    print("")

  plt.imshow(pixels, cmap=plt.get_cmap('gray_r'))
  plt.show()  

# ========================================================

def display_from_array(arr, idx):
  # assumes arr = loadtxt(. . ) has been called
  x_data = arr[:,0:784]
  y_data = arr[:,784]

  label = y_data[idx]
  print("digit = ", str(label), "\n")

  pixels = x_data[idx]
  pixels = pixels.reshape((28,28))
  for i in range(28):
    for j in range(28):
      print("%.2X" % pixels[i,j], end="")
      print(" ", end="")
    print("")

  plt.imshow(pixels, cmap=plt.get_cmap('gray_r'))
  plt.show() 

# ========================================================

def main():
  print("\nBegin show MNIST image demo \n")

  img_file = ".\\Data\\mnist_train_1000.txt"
  display_from_file(img_file, idx=0)  # first image

  # arr = np.loadtxt(img_file, delimiter="\t",
  #   usecols=range(0,785), dtype=np.int64)
  # for i in range(0,3):
  #   display_from_array(arr, i)  # first three images
  #   print(" ")
  #   input()

  print("\nEnd \n")

if __name__ == "__main__":
  main()

When working with machine learning, sometimes it’s easy to forget that all problems start with data. Data represents information and there are multiple ways to look at data and information.



I went to an interesting interactive show about the life and works of Vincent van Gogh (1853-1890). There were some 3D displays where you can take photos. Left: “Entrance Hall to Saint-Paul Hospital (Asylum)” (1889). Right: “Bedroom in Arles” (1888).


This entry was posted in Machine Learning, PyTorch. Bookmark the permalink.