seniorGradient Descent
What is implicit regularization of SGD?
Updated May 16, 2026
Short answer
SGD implicitly biases solutions toward simpler, more generalizable minima.
Deep explanation
Even without explicit regularization terms, SGD tends to converge to flat minima due to noise in updates. This acts as an implicit regularizer, favoring solutions that are robust to perturbations in parameter space and data samples.
Real-world example
Deep neural networks generalizing well even without heavy explicit regularization.
Common mistakes
- Assuming explicit regularization is always required for generalization.
Follow-up questions
- Why does noise help regularization?
- Is this unique to SGD?