seniorGradient Descent
What is implicit bias toward minimum norm solutions?
Updated May 16, 2026
Short answer
Gradient Descent tends to converge to minimum-norm solutions among many valid solutions.
Deep explanation
In linear models and some deep learning regimes, GD implicitly selects solutions with smallest L2 norm among all zero-training-error solutions. This occurs due to initialization and update geometry, influencing generalization behavior significantly.
Real-world example
Linear regression solutions learned via Gradient Descent.
Common mistakes
- Assuming all solutions with zero loss are equivalent.
Follow-up questions
- Why minimum norm matters?
- Does SGD share this bias?