seniorGradient Descent
What is implicit temperature in stochastic gradient descent?
Updated May 16, 2026
Short answer
SGD behaves like a thermal system with an implicit temperature controlled by noise.
Deep explanation
SGD dynamics can be modeled as a noisy physical system where the noise level acts like temperature. High temperature leads to exploration of the loss landscape, while low temperature leads to convergence. Batch size and learning rate jointly control this implicit temperature.
Real-world example
Simulated annealing-like behavior in deep learning training.
Common mistakes
- Treating SGD as purely deterministic optimization.
Follow-up questions
- What happens at high temperature?
- What is annealing?