What is optimizer.zero_grad() used for?

Updated May 17, 2026

Short answer

It resets gradients before backpropagation.

Deep explanation

PyTorch accumulates gradients by default, so zero_grad clears previous gradients.

Real-world example

Essential in every training loop.

Common mistakes

  • Forgetting zero_grad causing gradient accumulation bugs.

Follow-up questions

  • Why does PyTorch accumulate gradients?
  • What is gradient accumulation?

More PyTorch interview questions

View all →