What is stochastic gradient descent and how is it different from batch gradient descent?

Updated May 17, 2026

Short answer

SGD updates model parameters using one sample at a time, while batch gradient descent uses the entire dataset.

Deep explanation

Batch Gradient Descent computes gradients using the full dataset, resulting in stable but slow updates. Stochastic Gradient Descent (SGD) updates weights per sample, making it faster but noisier. Mini-batch gradient descent balances both by using small batches. SGD helps escape local minima due to noisy updates.

Unlock with a Pro subscription to view this section.

View pricing

Real-world example

No real-world example available yet.

Unlock with a Pro subscription to view this section.

Upgrade to Pro

Common mistakes

No common mistakes listed yet.

Unlock with a Pro subscription to view this section.

Upgrade to Pro

Follow-up questions

No follow-up questions available yet.

Unlock with a Pro subscription to view this section.

Upgrade to Pro

More Supervised Learning interview questions

View all →