midPyTorch
What is optimizer.zero_grad() used for?
Updated May 17, 2026
Short answer
It resets gradients before backpropagation.
Deep explanation
PyTorch accumulates gradients by default, so zero_grad clears previous gradients.
Real-world example
Essential in every training loop.
Common mistakes
- Forgetting zero_grad causing gradient accumulation bugs.
Follow-up questions
- Why does PyTorch accumulate gradients?
- What is gradient accumulation?