If you are designing a neural network multi-class classifier using PyTorch, you can use cross entropy loss (torch.nn.CrossEntropyLoss) with logits output (no activation) in the forward() method, or you can use negative log-likelihood loss (torch.nn.NLLLoss) with log-softmax (torch.LogSoftmax() module or torch.log_softmax() funcction) in the forward() method. Whew! That’s a mouthful. Let me explain with some code examples.
Suppose you are looking at the Iris Dataset, which has four predictor variables and three classes. The CrossEntropyLoss with logits approach is easier to implement and is by far the most common approach.
The demo run on the left uses CrossEntropyLoss with no activation on the output nodes. The demo run on the right uses NLLLoss with LogSoftmax activation on the output nodes. The results are identical.
A possible 4-7-3 network definition, and associated training code looks like:
class Net(T.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.hid1 = T.nn.Linear(4, 7) # 4-7-3
self.oupt = T.nn.Linear(7, 3)
# initialize wts and biases here
def forward(self, x):
z = T.tanh(self.hid1(x))
z = self.oupt(z) # logits output
return z
# training
. . .
loss_func = T.nn.CrossEntropyLoss()
optimizer = T.optim.SGD(net.parameters(), lr=lrn_rate)
. . .
loop
oupt = net(X)
loss_obj = loss_func(oupt, Y)
loss_obj.backward()
optimizer.step()
end-loop
This CrossEntropyLoss with logits output (logits just means no activation applied) technique is really just wrapper code around the older NLLLoss with LogSoftmax technique. That older approach could look like:
class Net(T.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.hid1 = T.nn.Linear(4, 7) # 4-7-3
self.oupt = T.nn.Linear(7, 3)
self.apply_log_soft = T.nn.LogSoftmax(dim=1) # Module
# initialize wts and biases here
def forward(self, x):
z = T.tanh(self.hid1(x))
z = self.oupt(z)
# z = T.log(T.softmax(z, dim=1)) # inefficient
z = self.apply_log_soft(z) # efficient
# z = T.log_softmax(z, dim=1) # function instead of Module
return z
# training
. . .
loss_func = T.nn.NLLLoss() # assumes LogSoftmax() applied
optimizer = T.optim.SGD(net.parameters(), lr=lrn_rate)
. . .
loop
oupt = net(X)
loss_obj = loss_func(oupt, Y)
loss_obj.backward()
optimizer.step()
end-loop
In short, when using the newer and simpler approach for multi-class classification, you don’t apply any activation to the output and then CrossEntropyLoss applies log-SoftMax internally. When using the older approach for multi-class classification, you apply LogSoftmax to the output and NLLLoss assumes you’ve done so.
When making a prediction, with the CrossEntropyLoss technique the raw output values will be logits so if you want to view probabilities you must apply SoftMax. With the older NLLLoss technique, the raw output values will be log of SoftMax so if you want to view probabilities you must apply the exp() function.
To summarize, when designing a neural network multi-class classifier, you can you CrossEntropyLoss with no activation, or you can use NLLLoss with log-SoftMax activation. This applies only to multi-class classification — binary classification and regression problems have a different set of rules.
When designing a house, there are many alternatives. Some designs are better than others.


.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
You must be logged in to post a comment.