midCost Function

What is vanishing gradient problem?

Updated May 15, 2026

Short answer

Gradients become too small to update parameters effectively.

Deep explanation

Occurs in deep networks with sigmoid/tanh activation functions, slowing learning.

Real-world example

Deep neural networks failing to learn long sequences.

Common mistakes

Using saturating activations in deep models.

Follow-up questions

How to solve it?
Why does it happen?

More Cost Function interview questions

How do modern alignment objectives combine multiple cost functions in LLMs?senior
What is the role of curvature in high-dimensional optimization landscapes?senior
How do attention mechanisms reshape optimization dynamics in transformers?senior
What is implicit bias of optimization algorithms in cost minimization?senior
What is entropy-temperature duality in cost function optimization?senior
How do loss landscapes evolve during training in deep neural networks?senior
What is Fisher-Rao geometry and why is it important in cost functions?senior
How does stochastic gradient descent converge to a continuous stochastic differential equation?senior