How does stochastic gradient descent converge to a continuous stochastic differential equation?

Updated May 15, 2026

Short answer

SGD converges in the continuous limit to a stochastic differential equation (SDE) resembling Langevin dynamics.

Deep explanation

When the learning rate becomes infinitesimally small and batch sampling introduces randomness, SGD dynamics approximate a continuous-time stochastic process. The discrete updates become a drift term (negative gradient of loss) plus a diffusion term (gradient noise). This SDE interpretation explains why SGD can escape sharp minima and why noise is not purely harmful but structurally important to exploration of the loss landscape.

Unlock with a Pro subscription to view this section.

View pricing

Real-world example

No real-world example available yet.

Unlock with a Pro subscription to view this section.

Upgrade to Pro

Common mistakes

No common mistakes listed yet.

Unlock with a Pro subscription to view this section.

Upgrade to Pro

Follow-up questions

No follow-up questions available yet.

Unlock with a Pro subscription to view this section.

Upgrade to Pro

More Cost Function interview questions

View all →