What is momentum in Gradient Descent?
Updated May 16, 2026
Short answer
Momentum accelerates Gradient Descent by accumulating past gradients.
Deep explanation
Momentum helps smooth updates by combining current gradient with previous updates, reducing oscillations and speeding convergence in ravines.
Real-world example
Faster convergence in deep neural network training.
Common mistakes
- Setting momentum too high causing instability.
Follow-up questions
- What is beta in momentum?
- How is it different from SGD?