I wrote an article titled “CIFAR-10 Image Classification Using PyTorch” in the April 2022 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2022/04/11/pytorch-image-classification.aspx.
The article explains how to create a PyTorch image classification system for the CIFAR-10 dataset. CIFAR-10 images are crude 32 x 32 color images of 10 classes such as “frog” and “car.” I give a complete demo. The demo program creates a convolutional neural network (CNN) that has two convolutional layers and three linear layers. The demo program trains the network for 100 epochs.
After training, the demo program computes the classification accuracy of the model on the test data as 45.90 percent = 459 out of 1,000 correct. The classification accuracy is better than random guessing (which would give about 10 percent accuracy) but isn’t very good mostly because only 5,000 of the 50,000 training images were used. A model using all training data can get about 90 percent accuracy on the test data.
Next, the trained model is used to predict the class label for a specific test item. The demo displays the image, then feeds the image to the trained model and displays the 10 output logit values. The largest of these values is -0.016942 which is at index location [6], which corresponds to class “frog.” This is a correct prediction.
The full CIFAR-10 (Canadian Institute for Advanced Research, 10 classes) dataset has 50,000 training images and 10,000 test images. Each image is 32 x 32 pixels. Because the images are color, each image has three channels (red, green, blue). Each pixel-channel value is an integer between 0 and 255. Each image is one of 10 classes: plane (class 0), car, bird, cat, deer, dog, frog, horse, ship, truck (class 9).
Creating an image classification system is paradoxically easy and difficult. The demo program can be used as a template — easy. But understanding all the design alternatives related to CNN architecture is very difficult.

Convolutional neural network architecture is quite complex. One meaning of “convolution” is “curved or coiled”. Hobbit houses from the Lord of the Rings movie series have convolutional architecture in the curved sense. Left: You can visit the movie sets in Waikato, New Zealand. Note: one of the earliest machine learning software tools is called WEKA, which stands for Waikato Environment for Knowledge Analysis, from the University of Waikato). Center: This is a real-life hobbit-style house near Tomich, Scotland. Right: This is a real-life hobbit-style house near Osoyoos, Canada.
Demo code:
# cifar_cnn.py
# PyTorch 1.10.0-CPU Anaconda3-2020.02 Python 3.7.6
# Windows 10/11
import numpy as np
import torch as T
import matplotlib.pyplot as plt
device = T.device('cpu')
# -----------------------------------------------------------
class Net(T.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = T.nn.Conv2d(3, 6, 5) # in, out, k, (s=1)
self.conv2 = T.nn.Conv2d(6, 16, 5)
self.pool = T.nn.MaxPool2d(2, stride=2)
self.fc1 = T.nn.Linear(16 * 5 * 5, 120) # 400-120-84-10
self.fc2 = T.nn.Linear(120, 84)
self.fc3 = T.nn.Linear(84, 10)
def forward(self, x):
z = T.nn.functional.relu(self.conv1(x)) # [10, 6, 28, 28]
z = self.pool(z) # [10, 6, 14, 14]
z = T.nn.functional.relu(self.conv2(z)) # [10, 16, 10, 10]
z = self.pool(z) # [10, 16, 5, 5]
z = z.reshape(-1, 16 * 5 * 5) # [bs, 400]
z = T.nn.functional.relu(self.fc1(z))
z = T.nn.functional.relu(self.fc2(z))
z = T.log_softmax(self.fc3(z), dim=1) # NLLLoss()
return z
# -----------------------------------------------------------
class CIFAR10_Dataset(T.utils.data.Dataset):
# 3072 comma-delim pixel values (0-255) then label (0-9)
def __init__(self, src_file):
all_xy = np.loadtxt(src_file, usecols=range(0,3073),
delimiter=",", comments="#", dtype=np.float32)
tmp_x = all_xy[:, 0:3072] # all rows, cols [0,3072]
tmp_x /= 255.0
tmp_x = tmp_x.reshape(-1, 3, 32, 32) # bs, chnls, 32x32
tmp_y = all_xy[:, 3072] # 1-D required
self.x_data = \
T.tensor(tmp_x, dtype=T.float32).to(device)
self.y_data = \
T.tensor(tmp_y, dtype=T.int64).to(device)
def __len__(self):
return len(self.x_data)
def __getitem__(self, idx):
lbl = self.y_data[idx]
pixels = self.x_data[idx]
return (pixels, lbl)
# -----------------------------------------------------------
def accuracy(model, ds):
X = ds[0:len(ds)][0] # all images
Y = ds[0:len(ds)][1] # all targets
with T.no_grad():
logits = model(X)
predicteds = T.argmax(logits, dim=1)
num_correct = T.sum(Y == predicteds)
acc = (num_correct * 1.0) / len(ds)
return acc.item()
# -----------------------------------------------------------
def main():
# 0. setup
print("\nBegin CIFAR-10 with raw data CNN demo ")
np.random.seed(1)
T.manual_seed(1)
# 1. create Dataset
print("\nLoading 5000 train and 1000 test images ")
train_file = ".\\Data\\cifar10_train_5000.txt"
train_ds = CIFAR10_Dataset(train_file)
test_file = ".\\Data\\cifar10_test_1000.txt"
test_ds = CIFAR10_Dataset(test_file)
bat_size = 10
train_ldr = T.utils.data.DataLoader(train_ds,
batch_size=bat_size, shuffle=True)
# 2. create network
print("\nCreating CNN with 2 conv and 400-120-84-10 ")
net = Net().to(device)
# -----------------------------------------------------------
# 3. train model
max_epochs = 100
ep_log_interval = 10
lrn_rate = 0.005
loss_func = T.nn.NLLLoss() # log-softmax() activation
optimizer = T.optim.SGD(net.parameters(), lr=lrn_rate)
print("\nbat_size = %3d " % bat_size)
print("loss = " + str(loss_func))
print("optimizer = SGD")
print("max_epochs = %3d " % max_epochs)
print("lrn_rate = %0.3f " % lrn_rate)
print("\nStarting training")
net.train()
for epoch in range(0, max_epochs):
# T.manual_seed(1 + epoch)
epoch_loss = 0 # for one full epoch
for (batch_idx, batch) in enumerate(train_ldr):
(X, Y) = batch # X = pixels, Y = target labels
optimizer.zero_grad()
oupt = net(X) # X is Size([bat_size, 3, 32, 32])
loss_val = loss_func(oupt, Y) # a tensor
epoch_loss += loss_val.item() # accumulate
loss_val.backward()
optimizer.step()
if epoch % ep_log_interval == 0:
print("epoch = %4d | loss = %10.4f | " % \
(epoch, epoch_loss), end="")
net.eval()
acc = accuracy(net, train_ds)
net.train()
print(" acc = %6.4f " % acc)
print("Done ")
# -----------------------------------------------------------
# 4. evaluate model accuracy
print("\nComputing model accuracy")
net.eval()
acc_test = accuracy(net, test_ds) # all at once
print("Accuracy on test data = %0.4f" % acc_test)
# 5. TODO: save trained model
# -----------------------------------------------------------
# 6. use model to make a prediction
print("\nPrediction for test image [29] ")
img = test_ds[29][0]
label = test_ds[29][1]
img_np = img.numpy() # 3,32,32
img_np = np.transpose(img_np, (1, 2, 0))
plt.imshow(img_np)
plt.show()
labels = ['plane', 'car', 'bird', 'cat', 'deer', 'dog',
'frog', 'horse', 'ship', 'truck']
img = img.reshape(1, 3, 32, 32) # make it a batch
with T.no_grad():
logits = net(img)
y = T.argmax(logits) # 0 to 9 as a tensor
print(logits) # 10 log-softmax values
print(y.item())
print(labels[y.item()]) # like "frog"
# pp = T.exp(logits) # pseudo-probabilities
# print(pp)
if y.item() == label.item():
print("correct")
else:
print("wrong")
print("\nEnd CIFAR-10 CNN demo ")
if __name__ == "__main__":
main()


.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
You must be logged in to post a comment.