seniorGradient Descent
What is Polyak averaging in Gradient Descent?
Updated May 16, 2026
Short answer
Polyak averaging smooths parameter updates by averaging iterates over time.
Deep explanation
Instead of using the final iterate, Polyak averaging computes the average of all previous parameter values, reducing variance and improving generalization. It is especially effective in stochastic optimization.
Real-world example
Used in distributed training to stabilize convergence.
Common mistakes
- Using only last iteration instead of averaged solution.
Follow-up questions
- Why does averaging help?
- Is it used in deep learning?