juniorGradient Descent
What is the difference between batch, stochastic, and mini-batch gradient descent?
Updated May 16, 2026
Short answer
They differ in how much data is used per update step.
Deep explanation
Batch Gradient Descent uses full dataset, Stochastic uses one sample, and Mini-batch uses subsets. Trade-offs include stability vs speed vs noise.
Real-world example
Mini-batch GD is used in training deep neural networks.
Common mistakes
- Assuming stochastic GD is always faster in convergence quality.
Follow-up questions
- Why is mini-batch most popular?
- Does batch GD scale well?