seniorGradient Descent
What is training stability threshold in Gradient Descent?
Updated May 16, 2026
Short answer
It is the boundary of hyperparameters beyond which training becomes unstable.
Deep explanation
Training stability depends on learning rate, batch size, and curvature. Beyond a certain threshold, updates diverge or oscillate. This threshold is often empirically determined and relates to Lipschitz constants and spectral radius of Hessian.
Real-world example
Transformer training divergence when learning rate is too high.
Common mistakes
- Assuming stability is only about learning rate.
Follow-up questions
- What affects stability threshold?
- How to estimate it?