What is gradient descent in over-parameterized models?

Updated May 16, 2026

Short answer

Over-parameterized models have more parameters than training constraints, affecting GD dynamics.

Deep explanation

In over-parameterized regimes, Gradient Descent often finds zero-training-loss solutions even in non-convex landscapes. Surprisingly, among infinite solutions, GD selects structured ones (implicit bias). This regime explains why deep networks generalize well despite being highly expressive.

Real-world example

Deep neural networks with millions of parameters fitting small datasets perfectly.

Common mistakes

  • Assuming overfitting is inevitable in over-parameterized models.

Follow-up questions

  • Why do such models generalize?
  • What is double descent?

More Gradient Descent interview questions

View all →