I wrote an article titled “Preparing CIFAR Image Data for PyTorch” in the April 2022 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2022/04/01/preparing-cifar-data.aspx.
A common dataset for image classification experiments is CIFAR-10. The goal of a CIFAR-10 problem is to analyze a crude 32 x 32 color image and predict which of 10 classes the image is. The 10 classes are plane, car, bird, cat, deer, dog, frog, horse, ship and truck.
The CIFAR-10 (Canadian Institute for Advanced Research, 10 classes) data has 50,000 images intended for training and 10,000 images for testing. The article explains how to get the raw source CIFAR-10 data, convert the data from binary to text and save the data as a text file that can be used to train a PyTorch neural network classifier.
Most popular neural network libraries, including PyTorch, scikit and Keras, have some form of built-in CIFAR-10 dataset designed to work with the library. But there are two problems with using a built-in dataset. First, data access becomes a magic black box and important information is hidden. Second, the built-in datasets use all 50,000 training and 10,000 test images and these are difficult to work with because they’re so large.
The cifar-10-batches-py source directory contains six binary files that have names with no file extension: data_batch_1, data_batch_2, data_batch_3, data_batch_4, data_batch_5 and test_batch. Each of these files contains 10,000 images in Python “pickle” binary format.
Each image is 32 x 32 pixels. Because the images are in color, there are three channels (red, green, blue). Each channel-pixel value is an integer between 0 and 255. Therefore, each image is represented by 32 * 32 * 3 = 3,072 values between 0 and 255.
To convert the CIFAR-10 images from binary pickle format to text, you need to write a short Python language program. My article presents such a program and explains how to modify it to suit any scenario. After unpickling the source data, the key lines of code are:
fn = ".\\cifar10_train_5000.txt" # file to save to
fout = open(fn, 'w', encoding='utf-8')
for i in range (n_images): # n images
for j in range(3072): # write the pixels
val = pixels[i][j]
fout.write(str(val) + ",")
fout.write(str(labels[i]) + "\n") # write the label
fout.close()
I don’t always enjoy working with raw data — I have more fun with algorithms. But the CIFAR-10 data is kind of interesting.

CIFAR images are for research but I enjoy looking at pulp science fiction novel cover images just for fun. Left: By artist Jack Gaughan. Center: By artist Gene Szafran. Right: By artist Richard Powers.


.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
You must be logged in to post a comment.