midCNN

What is batch normalization and why does it improve CNN training stability?

Updated May 15, 2026

Short answer

Batch normalization normalizes layer inputs during training, reducing internal covariate shift and stabilizing learning.

Deep explanation

Batch normalization standardizes activations by subtracting mean and dividing by variance for each mini-batch. It then scales and shifts the normalized output using learnable parameters. This stabilizes distributions across layers, allowing higher learning rates and faster convergence while reducing sensitivity to initialization.

Real-world example

Used in almost all modern CNN architectures like ResNet and EfficientNet.

Common mistakes

  • Using batch norm incorrectly during inference without switching to eval mode.

Follow-up questions

  • What is internal covariate shift?
  • Why does batch norm allow higher learning rates?

More CNN interview questions

View all →