What is coordinate descent vs gradient descent?

Updated May 16, 2026

Short answer

Coordinate descent optimizes one parameter at a time; gradient descent updates all simultaneously.

Deep explanation

Coordinate descent breaks optimization into single-variable subproblems, making it efficient for sparse or separable problems. Gradient descent updates all parameters using gradient direction. Each has trade-offs in convergence speed and computational cost.

Real-world example

Lasso regression often uses coordinate descent.

Common mistakes

  • Assuming coordinate descent always converges faster.

Follow-up questions

  • When is coordinate descent better?
  • Why is GD preferred in deep learning?

More Gradient Descent interview questions

View all →