What is stochastic gradient descent and how is it different from batch gradient descent?
Updated May 17, 2026
Short answer
SGD updates model parameters using one sample at a time, while batch gradient descent uses the entire dataset.
Deep explanation
Batch Gradient Descent computes gradients using the full dataset, resulting in stable but slow updates. Stochastic Gradient Descent (SGD) updates weights per sample, making it faster but noisier. Mini-batch gradient descent balances both by using small batches. SGD helps escape local minima due to noisy updates.
Unlock with a Pro subscription to view this section.
View pricingReal-world example
No real-world example available yet.
Unlock with a Pro subscription to view this section.
Upgrade to ProCommon mistakes
No common mistakes listed yet.
Unlock with a Pro subscription to view this section.
Upgrade to ProFollow-up questions
No follow-up questions available yet.
Unlock with a Pro subscription to view this section.
Upgrade to Pro