What is batch normalization and why is it used?
Updated May 17, 2026
Short answer
Batch normalization stabilizes and accelerates training by normalizing layer inputs.
Deep explanation
It normalizes activations using batch mean and variance, then scales and shifts them using learnable parameters. This reduces internal covariate shift and allows higher learning rates.
Real-world example
Used in ResNet to speed up convergence in image classification.
Common mistakes
- Applying batch norm incorrectly during inference or forgetting evaluation mode.
Follow-up questions
- What is internal covariate shift?
- Why does batch norm act as regularizer?