seniorPyTorch

What happens inside PyTorch when loss.backward() is called?

Updated May 17, 2026

Short answer

loss.backward() triggers reverse traversal of the autograd graph and computes gradients using chain rule.

Deep explanation

When backward() is called, PyTorch starts from the scalar loss node and traverses the dynamic computation graph in reverse topological order. Each node has a backward function (grad_fn) that computes local gradients and propagates them to parent tensors. Gradients accumulate in leaf tensors' .grad fields. Intermediate buffers are freed unless retain_graph=True is set.

Unlock with a Pro subscription to view this section.

View pricing

Real-world example

No real-world example available yet.

Unlock with a Pro subscription to view this section.

Upgrade to Pro

Common mistakes

No common mistakes listed yet.

Unlock with a Pro subscription to view this section.

Upgrade to Pro

Follow-up questions

No follow-up questions available yet.

Unlock with a Pro subscription to view this section.

Upgrade to Pro

More PyTorch interview questions

View all →