seniorGradient Descent
What is coordinate descent vs gradient descent?
Updated May 16, 2026
Short answer
Coordinate descent optimizes one parameter at a time; gradient descent updates all simultaneously.
Deep explanation
Coordinate descent breaks optimization into single-variable subproblems, making it efficient for sparse or separable problems. Gradient descent updates all parameters using gradient direction. Each has trade-offs in convergence speed and computational cost.
Real-world example
Lasso regression often uses coordinate descent.
Common mistakes
- Assuming coordinate descent always converges faster.
Follow-up questions
- When is coordinate descent better?
- Why is GD preferred in deep learning?