seniorGradient Descent
What is second-order optimization compared to Gradient Descent?
Updated May 16, 2026
Short answer
Second-order methods use curvature (Hessian) information, while Gradient Descent uses only first-order gradients.
Deep explanation
Gradient Descent updates parameters using first derivatives (gradient), whereas second-order methods like Newton's method use the Hessian matrix (second derivatives) to incorporate curvature information. This allows faster convergence near optima but is computationally expensive for high-dimensional problems like deep learning.
Real-world example
Used in small-scale optimization problems like logistic regression in classical statistics.
Common mistakes
- Assuming second-order methods are always better
- they scale poorly to deep learning models.
Follow-up questions
- What is Hessian matrix?
- Why is it expensive?