What is implicit regularization of SGD?

Updated May 16, 2026

Short answer

SGD implicitly biases solutions toward simpler, more generalizable minima.

Deep explanation

Even without explicit regularization terms, SGD tends to converge to flat minima due to noise in updates. This acts as an implicit regularizer, favoring solutions that are robust to perturbations in parameter space and data samples.

Real-world example

Deep neural networks generalizing well even without heavy explicit regularization.

Common mistakes

  • Assuming explicit regularization is always required for generalization.

Follow-up questions

  • Why does noise help regularization?
  • Is this unique to SGD?

More Gradient Descent interview questions

View all →