What is training stability threshold in Gradient Descent?

Updated May 16, 2026

Short answer

It is the boundary of hyperparameters beyond which training becomes unstable.

Deep explanation

Training stability depends on learning rate, batch size, and curvature. Beyond a certain threshold, updates diverge or oscillate. This threshold is often empirically determined and relates to Lipschitz constants and spectral radius of Hessian.

Real-world example

Transformer training divergence when learning rate is too high.

Common mistakes

  • Assuming stability is only about learning rate.

Follow-up questions

  • What affects stability threshold?
  • How to estimate it?

More Gradient Descent interview questions

View all →