What is stochastic gradient descent and how is it different from batch gradient descent?

Updated May 17, 2026

Short answer

SGD updates model parameters using one sample at a time, while batch gradient descent uses the entire dataset.

Deep explanation

Batch Gradient Descent computes gradients using the full dataset, resulting in stable but slow updates. Stochastic Gradient Descent (SGD) updates weights per sample, making it faster but noisier. Mini-batch gradient descent balances both by using small batches. SGD helps escape local minima due to noisy updates.

Unlock with a Pro subscription to view this section.

View pricing