seniorGradient Descent
What is implicit bias of initialization in Gradient Descent?
Updated May 16, 2026
Short answer
Initialization influences which solution Gradient Descent converges to.
Deep explanation
Even in non-convex or over-parameterized settings, different initializations lead GD to different minima. This creates an implicit bias because initialization selects a region of the loss landscape, shaping convergence trajectory and final solution properties.
Real-world example
Different random seeds in deep learning producing different models.
Common mistakes
- Assuming initialization only affects speed, not final solution.
Follow-up questions
- Why does initialization matter?
- What is symmetry breaking?