What is sharp vs flat minima in Gradient Descent?

Updated May 16, 2026

Short answer

Flat minima are wide low-loss regions; sharp minima are narrow and sensitive.

Deep explanation

Flat minima generalize better because small perturbations in parameters do not significantly increase loss. Sharp minima are sensitive and often lead to overfitting.

Real-world example

Deep learning models generalizing better when trained with noise.

Common mistakes

Assuming lowest loss always means best model.

Follow-up questions

Why do flat minima generalize better?
How to encourage flat minima?

Short answer

Deep explanation

Real-world example

Common mistakes

Follow-up questions

More Gradient Descent interview questions