What is vanishing gradient problem?
Updated May 15, 2026
Short answer
Gradients become too small to update parameters effectively.
Deep explanation
Occurs in deep networks with sigmoid/tanh activation functions, slowing learning.
Real-world example
Deep neural networks failing to learn long sequences.
Common mistakes
- Using saturating activations in deep models.
Follow-up questions
- How to solve it?
- Why does it happen?