seniorLinear Algebra
Why does gradient-based optimization depend on linear algebra structure of the loss landscape?
Updated May 16, 2026
Short answer
Because gradients, curvature, and updates are all defined in vector spaces.
Deep explanation
The loss landscape in ML is a high-dimensional vector space where gradients represent directional derivatives and Hessians represent curvature. Optimization algorithms like SGD, Adam, and Newton’s method are fundamentally linear algebra operations on these structures. Convergence behavior depends on eigenvalues of the Hessian matrix.
Real-world example
Training large neural networks like transformers relies on stable gradient flow.
Common mistakes
- Thinking optimization is purely scalar-based tuning.
Follow-up questions
- Why do eigenvalues affect convergence speed?