seniorGradient Descent
What is mini-batch generalization gap?
Updated May 16, 2026
Short answer
It is the difference between training and test performance caused by batch training dynamics.
Deep explanation
Mini-batch training introduces stochasticity that affects generalization. Smaller batches often generalize better due to noise, while large batches may converge to sharp minima, increasing generalization gap.
Real-world example
Large-batch training in distributed systems leading to worse generalization.
Common mistakes
- Assuming lower training loss always means better performance.
Follow-up questions
- Why do small batches generalize better?
- How to reduce gap?