What happens inside PyTorch when loss.backward() is called?
Updated May 17, 2026
Short answer
loss.backward() triggers reverse traversal of the autograd graph and computes gradients using chain rule.
Deep explanation
When backward() is called, PyTorch starts from the scalar loss node and traverses the dynamic computation graph in reverse topological order. Each node has a backward function (grad_fn) that computes local gradients and propagates them to parent tensors. Gradients accumulate in leaf tensors' .grad fields. Intermediate buffers are freed unless retain_graph=True is set.
Unlock with a Pro subscription to view this section.
View pricingReal-world example
No real-world example available yet.
Unlock with a Pro subscription to view this section.
Upgrade to ProCommon mistakes
No common mistakes listed yet.
Unlock with a Pro subscription to view this section.
Upgrade to ProFollow-up questions
No follow-up questions available yet.
Unlock with a Pro subscription to view this section.
Upgrade to Pro