What happens inside PyTorch when loss.backward() is called?

Updated May 17, 2026

Short answer

loss.backward() triggers reverse traversal of the autograd graph and computes gradients using chain rule.

Deep explanation

When backward() is called, PyTorch starts from the scalar loss node and traverses the dynamic computation graph in reverse topological order. Each node has a backward function (grad_fn) that computes local gradients and propagates them to parent tensors. Gradients accumulate in leaf tensors' .grad fields. Intermediate buffers are freed unless retain_graph=True is set.

Unlock with a Pro subscription to view this section.

View pricing