seniorGradient Descent
What is implicit neural bias toward simplicity in Gradient Descent?
Updated May 16, 2026
Short answer
Gradient Descent tends to favor simpler functions even without explicit regularization.
Deep explanation
Empirically and theoretically, Gradient Descent often converges to solutions with lower complexity (e.g., smoother or lower-norm functions). This arises from optimization geometry, initialization, and path dependence, not explicit penalties. This explains generalization in over-parameterized models.
Real-world example
Neural networks learning smooth decision boundaries instead of noisy ones.
Common mistakes
- Assuming simplicity must be explicitly enforced via regularization.
Follow-up questions
- What is Occam’s razor in ML?
- Is this always guaranteed?