What is asynchronous stochastic gradient descent?

Updated May 16, 2026

Short answer

Asynchronous SGD updates parameters without waiting for all workers.

Deep explanation

Each worker computes gradients independently and updates shared parameters without synchronization, improving speed but introducing stale gradient issues.

Real-world example

Large-scale recommendation systems at companies like Google.

Common mistakes

  • Ignoring stale gradient effects.

Follow-up questions

  • What is gradient staleness?
  • When is async SGD useful?

More Gradient Descent interview questions

View all →