What is implicit regularization of SGD?

Updated May 16, 2026

Short answer

SGD implicitly biases solutions toward simpler, more generalizable minima.

Deep explanation

Even without explicit regularization terms, SGD tends to converge to flat minima due to noise in updates. This acts as an implicit regularizer, favoring solutions that are robust to perturbations in parameter space and data samples.

Real-world example

Deep neural networks generalizing well even without heavy explicit regularization.

Common mistakes

Assuming explicit regularization is always required for generalization.

Follow-up questions

Why does noise help regularization?
Is this unique to SGD?

Short answer

Deep explanation

Real-world example

Common mistakes

Follow-up questions

More Gradient Descent interview questions