seniorGradient Descent
What is mini-batch noise stability trade-off?
Updated May 16, 2026
Short answer
It is the trade-off between noisy gradients and stable convergence in mini-batch GD.
Deep explanation
Smaller batches introduce noise which helps exploration but reduces stability. Larger batches stabilize updates but may lead to poorer generalization and higher computation cost.
Real-world example
Training deep learning models with batch sizes like 32, 64, 128.
Common mistakes
- Assuming larger batch size always improves performance.
Follow-up questions
- What is large batch training issue?
- Why does noise help?