seniorGradient Descent
What is entropy-SGD and why is it used?
Updated May 16, 2026
Short answer
Entropy-SGD optimizes a smoothed version of the loss landscape to find flat minima.
Deep explanation
Entropy-SGD modifies the objective by adding a local entropy term, encouraging exploration around parameters. It performs inner-loop Langevin dynamics to smooth the loss surface, then updates parameters based on this smoothed objective, improving generalization.
Real-world example
Improving generalization in deep convolutional networks.
Common mistakes
- Confusing entropy-SGD with standard SGD noise.
Follow-up questions
- Why does smoothing help?
- What is inner loop?