What is gradient descent with momentum as a dynamical system?

Updated May 16, 2026

Short answer

Momentum-based gradient descent can be modeled as a damped physical system.

Deep explanation

Momentum introduces a velocity term, making optimization analogous to a damped harmonic oscillator. The parameters behave like a particle moving through a potential field (loss surface), where gradients act as force and momentum acts as inertia. This interpretation explains oscillation reduction and faster convergence in ravines.

Real-world example

Training deep neural networks behaves like damped motion converging to stable equilibrium.

Common mistakes

  • Thinking momentum only speeds up training
  • it also stabilizes oscillations.

Follow-up questions

  • What is damping in this analogy?
  • Why is this model useful?

More Gradient Descent interview questions

View all →