What is Polyak averaging in Gradient Descent?

Updated May 16, 2026

Short answer

Polyak averaging smooths parameter updates by averaging iterates over time.

Deep explanation

Instead of using the final iterate, Polyak averaging computes the average of all previous parameter values, reducing variance and improving generalization. It is especially effective in stochastic optimization.

Real-world example

Used in distributed training to stabilize convergence.

Common mistakes

Using only last iteration instead of averaged solution.

Follow-up questions

Why does averaging help?
Is it used in deep learning?

Short answer

Deep explanation

Real-world example

Common mistakes

Follow-up questions

More Gradient Descent interview questions