What is batch normalization and why does it improve CNN training stability?

Updated May 15, 2026

Short answer

Batch normalization normalizes layer inputs during training, reducing internal covariate shift and stabilizing learning.

Deep explanation

Batch normalization standardizes activations by subtracting mean and dividing by variance for each mini-batch. It then scales and shifts the normalized output using learnable parameters. This stabilizes distributions across layers, allowing higher learning rates and faster convergence while reducing sensitivity to initialization.

Real-world example

Used in almost all modern CNN architectures like ResNet and EfficientNet.

Common mistakes

Using batch norm incorrectly during inference without switching to eval mode.

Follow-up questions

What is internal covariate shift?
Why does batch norm allow higher learning rates?

Short answer

Deep explanation

Real-world example

Common mistakes

Follow-up questions

More CNN interview questions