What is stochastic gradient descent?

Updated May 15, 2026

Short answer

SGD updates model parameters using one sample at a time.

Deep explanation

It approximates full gradient descent using single or small batches, making training faster and scalable.

Real-world example

Used in training large-scale deep learning models.

Common mistakes

  • Assuming it always converges smoothly.

Follow-up questions

  • SGD vs batch gradient descent?
  • Why use mini-batches?

More Cost Function interview questions

View all →