How does stochastic gradient descent converge to a continuous stochastic differential equation?
Updated May 15, 2026
Short answer
SGD converges in the continuous limit to a stochastic differential equation (SDE) resembling Langevin dynamics.
Deep explanation
When the learning rate becomes infinitesimally small and batch sampling introduces randomness, SGD dynamics approximate a continuous-time stochastic process. The discrete updates become a drift term (negative gradient of loss) plus a diffusion term (gradient noise). This SDE interpretation explains why SGD can escape sharp minima and why noise is not purely harmful but structurally important to exploration of the loss landscape.
Unlock with a Pro subscription to view this section.
View pricingReal-world example
No real-world example available yet.
Unlock with a Pro subscription to view this section.
Upgrade to ProCommon mistakes
No common mistakes listed yet.
Unlock with a Pro subscription to view this section.
Upgrade to ProFollow-up questions
No follow-up questions available yet.
Unlock with a Pro subscription to view this section.
Upgrade to Pro