What is the difference between batch, stochastic, and mini-batch gradient descent?

Updated May 16, 2026

Short answer

They differ in how much data is used per update step.

Deep explanation

Batch Gradient Descent uses full dataset, Stochastic uses one sample, and Mini-batch uses subsets. Trade-offs include stability vs speed vs noise.

Real-world example

Mini-batch GD is used in training deep neural networks.

Common mistakes

  • Assuming stochastic GD is always faster in convergence quality.

Follow-up questions

  • Why is mini-batch most popular?
  • Does batch GD scale well?

More Gradient Descent interview questions

View all →