What is implicit bias toward minimum norm solutions?

Updated May 16, 2026

Short answer

Gradient Descent tends to converge to minimum-norm solutions among many valid solutions.

Deep explanation

In linear models and some deep learning regimes, GD implicitly selects solutions with smallest L2 norm among all zero-training-error solutions. This occurs due to initialization and update geometry, influencing generalization behavior significantly.

Real-world example

Linear regression solutions learned via Gradient Descent.

Common mistakes

  • Assuming all solutions with zero loss are equivalent.

Follow-up questions

  • Why minimum norm matters?
  • Does SGD share this bias?

More Gradient Descent interview questions

View all →