seniorGradient Descent
What is learning rate warmup?
Updated May 16, 2026
Short answer
Warmup gradually increases learning rate at the start of training.
Deep explanation
Learning rate warmup prevents instability in early training by starting with small steps and gradually increasing to the target learning rate. It is especially useful in large-scale deep learning models.
Real-world example
Training transformers like BERT or GPT models.
Common mistakes
- Skipping warmup leading to early divergence.
Follow-up questions
- Why is warmup needed?
- What comes after warmup?