When you define a neural network in PyTorch, each weight and bias gets a gradient. The gradient values are computed automatically (“autograd”) and then used to adjust the values of the weights and biases during training.
In the early days of PyTorch, you had to manipulate gradients yourself. High level abstractions like the Module() class and the Dense() class take care of all the manipulation now (in most cases), but it’s still interesting to see an example of what is happening behind the scenes.
Suppose you have some math function f(x) = y = x^2 + 3x + 1. The value of f(4) is 4^2 + 3*4 + 1 = 29. The Calculus derivative of f(x) is f'(x) = dy/dx = 2*x + 3 and so the value of the derivative at 4 is f'(4) = 2*4 + 3 = 11. The Calculus derivative is the same (almost) as the gradient.
I wrote a tiny demo to illustrate this example.
# gradient_demo.py
import torch as T
device = T.device("cpu")
def some_func(x):
result = (x * x) + (3 * x) + 1
return result
def main():
print("\nBegin demo \n")
x = T.tensor([4.0], dtype=T.float32,
requires_grad=True).to(device)
y = some_func(x)
print("x = " + str(x))
print("y = " + str(y))
print("")
df = y.grad_fn
print("df = " + str(df))
print("")
y.backward() # compute grad of some_func(4)
print("gradient of func(x) = ")
print(x.grad) # 2(4) + 3 = 11
if __name__ == "__main__":
main()
The code is short but very dense in terms of ideas. But if you walk through each line, the demo should (eventually) make sense. The demo works with single values, such as [4], but in all non-demo scenarios, you’d be working with tensors with several values, such as [2.1, 5.4, 3.2] — but the principles are the same.
Let me emphasize that understanding how to directly manipulate PyTorch gradients isn’t necessary if you do standard things like neural networks. You only need to work with gradients at a low level if you are creating some sort of custom system. I’ve noticed this causes confusion for beginners because many introductory PyTorch tutorials go into a fair amount of detail explaining gradients, but then that information is never used to create a neural network.


.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
Excellent explanation, really great background with the pictures. I don’t know exactly how you can do it, but a great opening in 2021.
Meanwhile, my understanding of gradients feels like an approximation that goes one step closer to the target with each explanation.
When Climbing, or more precisely Bouldering, it helps immensely to be with good climbers. Every single route represents a problem, from very easy to impossible. But even if the route seems impossible, if you sharpen your skills enough and watch the good ones closely as they try, you will be able to surprise yourself.
Compared to climbing, your efforts are like those of an Alex Megos. At the beginning of climbing, there were routes that were impossible to climb, but with people like Alex, this space expanded in all directions.