seniorGradient Descent
What is gradient descent with momentum as a dynamical system?
Updated May 16, 2026
Short answer
Momentum-based gradient descent can be modeled as a damped physical system.
Deep explanation
Momentum introduces a velocity term, making optimization analogous to a damped harmonic oscillator. The parameters behave like a particle moving through a potential field (loss surface), where gradients act as force and momentum acts as inertia. This interpretation explains oscillation reduction and faster convergence in ravines.
Real-world example
Training deep neural networks behaves like damped motion converging to stable equilibrium.
Common mistakes
- Thinking momentum only speeds up training
- it also stabilizes oscillations.
Follow-up questions
- What is damping in this analogy?
- Why is this model useful?