Why does gradient-based optimization depend on linear algebra structure of the loss landscape?

Updated May 16, 2026

Short answer

Because gradients, curvature, and updates are all defined in vector spaces.

Deep explanation

The loss landscape in ML is a high-dimensional vector space where gradients represent directional derivatives and Hessians represent curvature. Optimization algorithms like SGD, Adam, and Newton’s method are fundamentally linear algebra operations on these structures. Convergence behavior depends on eigenvalues of the Hessian matrix.

Real-world example

Training large neural networks like transformers relies on stable gradient flow.

Common mistakes

Thinking optimization is purely scalar-based tuning.

Follow-up questions

Why do eigenvalues affect convergence speed?

Short answer

Deep explanation

Real-world example

Common mistakes

Follow-up questions

More Linear Algebra interview questions