seniorKeras

Why does reducing batch size sometimes improve generalization?

Updated May 16, 2026

Short answer

Small batches introduce gradient noise that helps escape sharp minima.

Deep explanation

Smaller batch sizes lead to noisier gradient estimates, which act as regularization and help the optimizer avoid overfitting narrow minima. This often leads to better generalization on unseen data.

Unlock with a Pro subscription to view this section.

View pricing

Real-world example

No real-world example available yet.

Unlock with a Pro subscription to view this section.

Upgrade to Pro

Common mistakes

No common mistakes listed yet.

Unlock with a Pro subscription to view this section.

Upgrade to Pro

Follow-up questions

No follow-up questions available yet.

Unlock with a Pro subscription to view this section.

Upgrade to Pro

More Keras interview questions

View all →