What is Polyak averaging in Gradient Descent?

Updated May 16, 2026

Short answer

Polyak averaging smooths parameter updates by averaging iterates over time.

Deep explanation

Instead of using the final iterate, Polyak averaging computes the average of all previous parameter values, reducing variance and improving generalization. It is especially effective in stochastic optimization.

Real-world example

Used in distributed training to stabilize convergence.

Common mistakes

  • Using only last iteration instead of averaged solution.

Follow-up questions

  • Why does averaging help?
  • Is it used in deep learning?

More Gradient Descent interview questions

View all →