What is vanishing gradient problem?
Updated May 16, 2026
Short answer
Vanishing gradient occurs when gradients become extremely small during backpropagation.
Deep explanation
In deep networks, repeated multiplication of small derivatives causes gradients to shrink, slowing or stopping learning in early layers. This is common with sigmoid/tanh activations.
Real-world example
Deep neural networks failing to learn long-term dependencies.
Common mistakes
- Using sigmoid in deep networks without mitigation.
Follow-up questions
- How to fix it?
- What is exploding gradient?