What is stochastic differential equation view of Gradient Descent?

Updated May 16, 2026

Short answer

SGD can be modeled as a stochastic differential equation with noise-driven dynamics.

Deep explanation

In continuous-time view, SGD behaves like a system with deterministic drift (gradient) and stochastic diffusion (noise). This connects optimization with physics and helps analyze generalization behavior.

Real-world example

Modeling neural network training dynamics using diffusion processes.

Common mistakes

  • Ignoring stochasticity as mere computational artifact.

Follow-up questions

  • What is diffusion term?
  • Why is this perspective useful?

More Gradient Descent interview questions

View all →