seniorGradient Descent
What is asynchronous stochastic gradient descent?
Updated May 16, 2026
Short answer
Asynchronous SGD updates parameters without waiting for all workers.
Deep explanation
Each worker computes gradients independently and updates shared parameters without synchronization, improving speed but introducing stale gradient issues.
Real-world example
Large-scale recommendation systems at companies like Google.
Common mistakes
- Ignoring stale gradient effects.
Follow-up questions
- What is gradient staleness?
- When is async SGD useful?