What is entropy-SGD and why is it used?

Updated May 16, 2026

Short answer

Entropy-SGD optimizes a smoothed version of the loss landscape to find flat minima.

Deep explanation

Entropy-SGD modifies the objective by adding a local entropy term, encouraging exploration around parameters. It performs inner-loop Langevin dynamics to smooth the loss surface, then updates parameters based on this smoothed objective, improving generalization.

Real-world example

Improving generalization in deep convolutional networks.

Common mistakes

Confusing entropy-SGD with standard SGD noise.

Follow-up questions

Why does smoothing help?
What is inner loop?

Short answer

Deep explanation

Real-world example

Common mistakes

Follow-up questions

More Gradient Descent interview questions