seniorKeras
Why does reducing batch size sometimes improve generalization?
Updated May 16, 2026
Short answer
Small batches introduce gradient noise that helps escape sharp minima.
Deep explanation
Smaller batch sizes lead to noisier gradient estimates, which act as regularization and help the optimizer avoid overfitting narrow minima. This often leads to better generalization on unseen data.
Unlock with a Pro subscription to view this section.
View pricingReal-world example
No real-world example available yet.
Unlock with a Pro subscription to view this section.
Upgrade to ProCommon mistakes
No common mistakes listed yet.
Unlock with a Pro subscription to view this section.
Upgrade to ProFollow-up questions
No follow-up questions available yet.
Unlock with a Pro subscription to view this section.
Upgrade to Pro