How does stochastic gradient descent converge to a continuous stochastic differential equation?

Updated May 15, 2026

Short answer

SGD converges in the continuous limit to a stochastic differential equation (SDE) resembling Langevin dynamics.

Deep explanation

When the learning rate becomes infinitesimally small and batch sampling introduces randomness, SGD dynamics approximate a continuous-time stochastic process. The discrete updates become a drift term (negative gradient of loss) plus a diffusion term (gradient noise). This SDE interpretation explains why SGD can escape sharp minima and why noise is not purely harmful but structurally important to exploration of the loss landscape.

Unlock with a Pro subscription to view this section.

View pricing