What is mini-batch generalization gap?

Updated May 16, 2026

Short answer

It is the difference between training and test performance caused by batch training dynamics.

Deep explanation

Mini-batch training introduces stochasticity that affects generalization. Smaller batches often generalize better due to noise, while large batches may converge to sharp minima, increasing generalization gap.

Real-world example

Large-batch training in distributed systems leading to worse generalization.

Common mistakes

Assuming lower training loss always means better performance.

Follow-up questions

Why do small batches generalize better?
How to reduce gap?

Short answer

Deep explanation

Real-world example

Common mistakes

Follow-up questions

More Gradient Descent interview questions