What is implicit bias of initialization in Gradient Descent?

Updated May 16, 2026

Short answer

Initialization influences which solution Gradient Descent converges to.

Deep explanation

Even in non-convex or over-parameterized settings, different initializations lead GD to different minima. This creates an implicit bias because initialization selects a region of the loss landscape, shaping convergence trajectory and final solution properties.

Real-world example

Different random seeds in deep learning producing different models.

Common mistakes

  • Assuming initialization only affects speed, not final solution.

Follow-up questions

  • Why does initialization matter?
  • What is symmetry breaking?

More Gradient Descent interview questions

View all →