seniorGradient Descent
What is Polyak's Heavy Ball method in Gradient Descent?
Updated May 16, 2026
Short answer
Polyak's Heavy Ball method adds momentum from previous updates to accelerate convergence.
Deep explanation
The Heavy Ball method introduces a velocity term that accumulates past gradients, allowing updates to maintain direction and reduce oscillations in ravine-like loss surfaces. Unlike vanilla Gradient Descent, it incorporates inertia, which helps escape shallow curvature and speeds up convergence in ill-conditioned problems.
Real-world example
Used in training deep neural networks where loss surfaces have long, narrow valleys.
Common mistakes
- Confusing Heavy Ball method with Nesterov acceleration
- they differ in where gradient is evaluated.
Follow-up questions
- How is it different from SGD?
- What role does beta play?