What is gradient clipping and why is it used?

Updated May 16, 2026

Short answer

Gradient clipping limits the magnitude of gradients to prevent exploding updates.

Deep explanation

Gradient clipping scales gradients when their norm exceeds a threshold, preventing unstable updates in deep networks, especially RNNs and transformers.

Real-world example

Training large language models without instability.

Common mistakes

  • Clipping too aggressively, slowing learning.

Follow-up questions

  • What types of clipping exist?
  • Why is it important for RNNs?

More Gradient Descent interview questions

View all →