What is vanishing gradient problem?

Updated May 15, 2026

Short answer

Gradients become too small to update parameters effectively.

Deep explanation

Occurs in deep networks with sigmoid/tanh activation functions, slowing learning.

Real-world example

Deep neural networks failing to learn long sequences.

Common mistakes

  • Using saturating activations in deep models.

Follow-up questions

  • How to solve it?
  • Why does it happen?

More Cost Function interview questions

View all →