What is second-order optimization compared to Gradient Descent?

Updated May 16, 2026

Short answer

Second-order methods use curvature (Hessian) information, while Gradient Descent uses only first-order gradients.

Deep explanation

Gradient Descent updates parameters using first derivatives (gradient), whereas second-order methods like Newton's method use the Hessian matrix (second derivatives) to incorporate curvature information. This allows faster convergence near optima but is computationally expensive for high-dimensional problems like deep learning.

Real-world example

Used in small-scale optimization problems like logistic regression in classical statistics.

Common mistakes

  • Assuming second-order methods are always better
  • they scale poorly to deep learning models.

Follow-up questions

  • What is Hessian matrix?
  • Why is it expensive?

More Gradient Descent interview questions

View all →