Transformer architecture (TA) neural networks were designed for natural language processing (NLP). I’ve been exploring the idea of applying TA to tabular data. The problem is that in NLP all inputs are integers that represent words/tokens. For example, an input of “I think therefore I am” is mapped to integer tokens something like [19, 47, 132, 19, 27]. Then the integer tokens are converted to an embedding vector. For example 19 = [0.1234, -1.0987, 0.3579, 1.1333] where the number of values (4 here) is a hyperparameter called the embedding dim. The embedding values are determined during training.

Demo of a custom embedding layer for numeric input data
Now suppose that instead of dealing with NLP input, you are dealing with numeric input such as a person’s normalized age of 0.31 and normalized annual income of 0.7850. Because the inputs are not integers, you can’t use the PyTorch built-in torch.nn.Embedding layer to create embedding vectors. I wondered if it would be possible to create a custom embedding layer that converts numeric input into embedding vectors.
After some experimentation I managed to create an example of a custom PyTorch embedding layer for numeric input data.

When I design complex neural architectures I often use pen and paper. Here’s the paper I used while designing the code presented in this blog post. The paw in the lower right is a canine visitor named “Llama” who was helping me.
I used the Iris dataset. It has four numeric input values: sepal length, sepal width, petal length, petal width. The goal is to classify an iris flower as one of three species: setosa (0), versicolor (1), or virginica (2). Each input is converted to an embedding vector with 2 values.
Note: Conceptually, a word embedding creates vectors where similar words (“boy” and “man”) are mathematically close together. For numeric input, an embedding doesn’t do that. The idea is to create a layer that isn’t fully connected and therefore the input values don’t have a direct relationship with each other. The ideas are pretty deep.
My experiment was hard-coded specifically for the Iris dataset and is just a proof of concept. The idea is to create a separate weight matrix for each of the four input values. Each of the four inputs generates a temp result matrix, and then the four temp matrices are combined into the final result.
The key network definition code looks like
class Net(T.nn.Module):
def __init__(self):
super(Net, self).__init__() # Python 3.2 and earlier
self.embed = NumericEmbedLayer(4, 2) # 4-8
self.hid1 = T.nn.Linear(8, 10) # 8-10
self.oupt = T.nn.Linear(10, 3) # 10-3
def forward(self, x): # x is [bs, 4]
z = self.embed(x) # z is [bs, 8]
z = T.tanh(self.hid1(z)) # z is [bs, 10]
z = T.log_softmax(self.oupt(z), dim=1) # NLLLoss()
return z # z is [bs, 3]
The 4 inputs are fed to the custom NumericEmbedLayer which produces 8 values. Those 8 values go to a hidden layer which outputs 10 values. The final output layer maps the 10 values to 3 values.
The experiement was a lot more difficult than I thought it’d be. Creating a general purpose embedding layer for arbitrary numeric input would require a significant effort. Maybe I’ll get around to it some day.

Three nice images from a search for “embedded portrait”. Left: By artist Hans Jochem Bakker. Center: By artist Christopher Kennedy. Right: By artist Daniel Arrhakis.
Demo code. Replace “lt”, “gt”, “lte”, “gte” with Boolean operator symbols.
# iris_embedding.py
# PyTorch 1.10.0-CPU Anaconda3-2020.02 Python 3.7.6
# Windows 10/11
# experiment with embedding for numeric data
import numpy as np
import torch as T
device = T.device('cpu') # apply to Tensor or Module
# -----------------------------------------------------------
class NumericEmbedLayer(T.nn.Module):
def __init__(self, n_in, embed_dim): # n_in = 4 for Iris
super().__init__() # shortcut syntax
# hard-coded for Iris dataset - not a general soln
# one weight matrix per feature
self.weights_0 = T.nn.Parameter(T.zeros((embed_dim, 1),
dtype=T.float32))
self.weights_1 = T.nn.Parameter(T.zeros((embed_dim, 1),
dtype=T.float32))
self.weights_2 = T.nn.Parameter(T.zeros((embed_dim, 1),
dtype=T.float32))
self.weights_3 = T.nn.Parameter(T.zeros((embed_dim, 1),
dtype=T.float32))
# no biases
T.nn.init.uniform_(self.weights_0, -0.10, 0.10)
T.nn.init.uniform_(self.weights_1, -0.10, 0.10)
T.nn.init.uniform_(self.weights_2, -0.10, 0.10)
T.nn.init.uniform_(self.weights_3, -0.10, 0.10)
def forward(self, x):
col_0 = x[:,0:1] # fetch each input column
col_1 = x[:,1:2]
col_2 = x[:,2:3]
col_3 = x[:,3:4]
# create the embeddings
tmp_0 = T.mm(col_0, self.weights_0.t()) # [bs, 2]
tmp_1 = T.mm(col_1, self.weights_1.t())
tmp_2 = T.mm(col_2, self.weights_2.t())
tmp_3 = T.mm(col_3, self.weights_3.t())
# combine
res = T.hstack((tmp_0, tmp_1, tmp_2, tmp_3)) # [bs, 8]
return res
# -----------------------------------------------------------
class IrisDataset(T.utils.data.Dataset):
def __init__(self, src_file, num_rows=None):
# 5.0, 3.5, 1.3, 0.3, 0
tmp_x = np.loadtxt(src_file, max_rows=num_rows,
usecols=range(0,4), delimiter=",", comments="#",
dtype=np.float32)
tmp_y = np.loadtxt(src_file, max_rows=num_rows,
usecols=4, delimiter=",", comments="#",
dtype=np.int64)
self.x_data = T.tensor(tmp_x, dtype=T.float32).to(device)
self.y_data = T.tensor(tmp_y, dtype=T.int64).to(device)
def __len__(self):
return len(self.x_data)
def __getitem__(self, idx):
preds = self.x_data[idx]
spcs = self.y_data[idx]
sample = { 'predictors' : preds, 'species' : spcs }
return sample # as Dictionary
# -----------------------------------------------------------
class Net(T.nn.Module):
def __init__(self):
super(Net, self).__init__() # Python 3.2 and earlier
# super().__init__() # shortcut syntax 3.3 and later
self.embed = NumericEmbedLayer(4, 2) # 4-8
self.hid1 = T.nn.Linear(8, 10) # 8-10
self.oupt = T.nn.Linear(10, 3) # 10-3
# override default initialization
lo = -0.10; hi = +0.10
T.nn.init.uniform_(self.hid1.weight, lo, hi)
T.nn.init.zeros_(self.hid1.bias)
T.nn.init.uniform_(self.oupt.weight, lo, hi)
T.nn.init.zeros_(self.oupt.bias)
def forward(self, x): # x is [bs, 4]
z = self.embed(x) # z is [bs, 8]
z = T.tanh(self.hid1(z)) # z is [bs, 10]
z = T.log_softmax(self.oupt(z), dim=1) # NLLLoss()
return z # z is [bs, 3]
# -----------------------------------------------------------
def accuracy(model, dataset):
# assumes model.eval()
dataldr = T.utils.data.DataLoader(dataset, batch_size=1,
shuffle=False)
n_correct = 0; n_wrong = 0
for (_, batch) in enumerate(dataldr):
X = batch['predictors']
Y = batch['species'] # already 1D shaped by Dataset
with T.no_grad():
oupt = model(X) # logits form
big_idx = T.argmax(oupt)
# if big_idx.item() == Y.item():
if big_idx == Y:
n_correct += 1
else:
n_wrong += 1
acc = (n_correct * 1.0) / (n_correct + n_wrong)
return acc
# -----------------------------------------------------------
def main():
# 0. get started
print("\nBegin Iris numeric embedding experiment ")
T.manual_seed(1)
np.random.seed(1)
# 1. create DataLoader objects
print("\nCreating Iris train and test Datasets ")
train_file = ".\\Data\\iris_train.txt"
test_file = ".\\Data\\iris_test.txt"
train_ds = IrisDataset(train_file) # 120 items
test_ds = IrisDataset(test_file) # 30
bat_size = 6
train_ldr = T.utils.data.DataLoader(train_ds,
batch_size=bat_size, shuffle=True)
# -----------------------------------------------------------
# 2. create network
print("\nCreating 4-(8)-10-3 neural network ")
net = Net().to(device)
# 3. train model
max_epochs = 500
ep_log_interval = 50
lrn_rate = 0.01
loss_func = T.nn.NLLLoss() # assumes log_softmax()
optimizer = T.optim.SGD(net.parameters(), lr=lrn_rate)
print("\nbat_size = %3d " % bat_size)
print("loss = " + str(loss_func))
print("optimizer = SGD")
print("max_epochs = %3d " % max_epochs)
print("lrn_rate = %0.3f " % lrn_rate)
print("\nStarting training")
net.train()
for epoch in range(0, max_epochs):
epoch_loss = 0 # for one full epoch
for (batch_idx, batch) in enumerate(train_ldr):
X = batch['predictors'] # [10,4]
Y = batch['species'] # OK; alreay flattened
optimizer.zero_grad()
oupt = net(X)
loss_val = loss_func(oupt, Y) # a tensor
epoch_loss += loss_val.item() # accumulate
loss_val.backward() # compute gradients
optimizer.step() # update weights and biases
if epoch % ep_log_interval == 0:
print("epoch = %4d | loss = %8.4f | " % \
(epoch, epoch_loss), end="")
net.eval()
train_acc = accuracy(net, train_ds)
print(" acc = %8.4f " % train_acc)
net.train()
print("Done ")
# -----------------------------------------------------------
# 4. evaluate model accuracy
print("\nComputing model accuracy")
net.eval()
acc = accuracy(net, test_ds) # item-by-item
print("Accuracy on test data = %0.4f" % acc)
# 5. make a prediction
print("\nPredicting species for [6.1, 3.1, 5.1, 1.1]: ")
x = np.array([[6.1, 3.1, 5.1, 1.1]], dtype=np.float32)
x = T.tensor(x, dtype=T.float32).to(device)
with T.no_grad():
logits = net(x) # as log_softmax
probs = T.exp(logits) # pseudo-probs
T.set_printoptions(precision=4)
print(probs)
# -----------------------------------------------------------
# 6. save model (state_dict approach)
print("\nSaving trained model state")
fn = ".\\Models\\iris_model.pt"
T.save(net.state_dict(), fn)
# saved_model = Net()
# saved_model.load_state_dict(T.load(fn))
# use saved_model to make prediction(s)
print("\nEnd numeric embedding experiment ")
if __name__ == "__main__":
main()
.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
You must be logged in to post a comment.