seniorGradient Descent
What is gradient descent in non-convex optimization?
Updated May 16, 2026
Short answer
It refers to optimizing functions with multiple local minima and saddle points.
Deep explanation
Non-convex optimization is challenging because Gradient Descent may converge to local minima or saddle points instead of global minima. Modern deep learning relies on heuristics like initialization, momentum, and stochasticity to handle this.
Real-world example
Training deep neural networks and transformer models.
Common mistakes
- Assuming convergence guarantees to global optimum.
Follow-up questions
- Why is deep learning non-convex?
- How is it still effective?