To train a PyTorch neural network, the most common approach is to read training data into a Dataset object, and then use a DataLoader object to serve the training data up in batches. When I implement a Dataset, I almost always use the NumPy loadtxt() function to read training data from file into memory. But it’s possible to use the Pandas read_csv() function instead. Bottom line: the Pandas approach isn’t especially useful because the Pandas data frame has to be converted to a NumPy matrix anyway.
I used one of my standard examples to code up a demo of NumPy loadtxt() vs Pandas read_csv() functions. The goal is to predict political leaning (conservative = 0, moderate = 1, liberal = 2) from sex, age, state of residence, and income. The data looks like:
1 0.24 1 0 0 0.2950 2 -1 0.39 0 0 1 0.5120 1 1 0.63 0 1 0 0.7580 0 -1 0.36 1 0 0 0.4450 1 1 0.27 0 1 0 0.2860 2 . . .
The columns are sex (M = -1, F = +1), age divided by 100, state (Michigan = 100, Nebraska = 010, Oklahoma = 001), income divided by $100,000, and political leaning. The data is synthetic.
A standard NumPy loadtxt() version of a Dataset is:
import numpy as np
import pandas as pd # not used this version
class PeopleDataset(T.utils.data.Dataset):
def __init__(self, src_file):
# numpy loadtxt() version
all_xy = np.loadtxt(src_file, usecols=range(0,7),
delimiter="\t", comments="#", dtype=np.float32)
tmp_x = all_xy[:,0:6] # cols [0,6) = [0,5]
tmp_y = all_xy[:,6] # 1-D
self.x_data = T.tensor(tmp_x,
dtype=T.float32).to(device)
self.y_data = T.tensor(tmp_y,
dtype=T.int64).to(device) # 1-D
def __len__(self):
return len(self.x_data)
def __getitem__(self, idx):
preds = self.x_data[idx]
trgts = self.y_data[idx]
return preds, trgts # as a Tuple
A version using the Pandas read_csv() and the to_nump() method is:
class PeopleDataset(T.utils.data.Dataset):
def __init__(self, src_file):
# pandas version
xy_frame = pd.read_csv(src_file, usecols=range(0,7),
delimiter="\t", comment="#", dtype=np.float32)
all_xy = xy_frame.to_numpy()
# as above
. . .
Instead of using the Pandas to_numpy() function, it’s possible to access the Pandas dataframe directly using the iloc property:
class PeopleDataset(T.utils.data.Dataset):
def __init__(self, src_file):
# pandas version
xy_frame = pd.read_csv(src_file, usecols=range(0,7),
delimiter="\t", comment="#", dtype=np.float32)
all_xy = np.array(xy_frame.iloc[:,:])
# as above
. . .
The rest of the program and the training and test data can be found at: https://jamesmccaffreyblog.com/2022/09/01/multi-class-classification-using-pytorch-1-12-1-on-windows-10-11/.
There’s no big moral to this story — just some fun mental exercise to stay in practice with PyTorch.

Two wonderful illustrations tagged as “amazingsurf” from fractal.batjorge.com. I don’t know the artist, but I’ll bet he does artistic exercises to stay in practice.

.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
You must be logged in to post a comment.