seniorGradient Descent
What is stochastic differential equation view of Gradient Descent?
Updated May 16, 2026
Short answer
SGD can be modeled as a stochastic differential equation with noise-driven dynamics.
Deep explanation
In continuous-time view, SGD behaves like a system with deterministic drift (gradient) and stochastic diffusion (noise). This connects optimization with physics and helps analyze generalization behavior.
Real-world example
Modeling neural network training dynamics using diffusion processes.
Common mistakes
- Ignoring stochasticity as mere computational artifact.
Follow-up questions
- What is diffusion term?
- Why is this perspective useful?