What is entropy-SGD and why is it used?

Updated May 16, 2026

Short answer

Entropy-SGD optimizes a smoothed version of the loss landscape to find flat minima.

Deep explanation

Entropy-SGD modifies the objective by adding a local entropy term, encouraging exploration around parameters. It performs inner-loop Langevin dynamics to smooth the loss surface, then updates parameters based on this smoothed objective, improving generalization.

Real-world example

Improving generalization in deep convolutional networks.

Common mistakes

  • Confusing entropy-SGD with standard SGD noise.

Follow-up questions

  • Why does smoothing help?
  • What is inner loop?

More Gradient Descent interview questions

View all →