seniorGradient Descent
What is gradient clipping and why is it used?
Updated May 16, 2026
Short answer
Gradient clipping limits the magnitude of gradients to prevent exploding updates.
Deep explanation
Gradient clipping scales gradients when their norm exceeds a threshold, preventing unstable updates in deep networks, especially RNNs and transformers.
Real-world example
Training large language models without instability.
Common mistakes
- Clipping too aggressively, slowing learning.
Follow-up questions
- What types of clipping exist?
- Why is it important for RNNs?