seniorGradient Descent
What is gradient descent in over-parameterized models?
Updated May 16, 2026
Short answer
Over-parameterized models have more parameters than training constraints, affecting GD dynamics.
Deep explanation
In over-parameterized regimes, Gradient Descent often finds zero-training-loss solutions even in non-convex landscapes. Surprisingly, among infinite solutions, GD selects structured ones (implicit bias). This regime explains why deep networks generalize well despite being highly expressive.
Real-world example
Deep neural networks with millions of parameters fitting small datasets perfectly.
Common mistakes
- Assuming overfitting is inevitable in over-parameterized models.
Follow-up questions
- Why do such models generalize?
- What is double descent?