What is mini-batch generalization gap?

Updated May 16, 2026

Short answer

It is the difference between training and test performance caused by batch training dynamics.

Deep explanation

Mini-batch training introduces stochasticity that affects generalization. Smaller batches often generalize better due to noise, while large batches may converge to sharp minima, increasing generalization gap.

Real-world example

Large-batch training in distributed systems leading to worse generalization.

Common mistakes

  • Assuming lower training loss always means better performance.

Follow-up questions

  • Why do small batches generalize better?
  • How to reduce gap?

More Gradient Descent interview questions

View all →