midCNN
What is batch normalization and why does it improve CNN training stability?
Updated May 15, 2026
Short answer
Batch normalization normalizes layer inputs during training, reducing internal covariate shift and stabilizing learning.
Deep explanation
Batch normalization standardizes activations by subtracting mean and dividing by variance for each mini-batch. It then scales and shifts the normalized output using learnable parameters. This stabilizes distributions across layers, allowing higher learning rates and faster convergence while reducing sensitivity to initialization.
Real-world example
Used in almost all modern CNN architectures like ResNet and EfficientNet.
Common mistakes
- Using batch norm incorrectly during inference without switching to eval mode.
Follow-up questions
- What is internal covariate shift?
- Why does batch norm allow higher learning rates?