Why does gradient-based optimization depend on linear algebra structure of the loss landscape?

Updated May 16, 2026

Short answer

Because gradients, curvature, and updates are all defined in vector spaces.

Deep explanation

The loss landscape in ML is a high-dimensional vector space where gradients represent directional derivatives and Hessians represent curvature. Optimization algorithms like SGD, Adam, and Newton’s method are fundamentally linear algebra operations on these structures. Convergence behavior depends on eigenvalues of the Hessian matrix.

Real-world example

Training large neural networks like transformers relies on stable gradient flow.

Common mistakes

  • Thinking optimization is purely scalar-based tuning.

Follow-up questions

  • Why do eigenvalues affect convergence speed?

More Linear Algebra interview questions

View all →