seniorGradient Descent
What is batch normalization and its relation to Gradient Descent?
Updated May 16, 2026
Short answer
Batch normalization stabilizes inputs to layers, improving Gradient Descent efficiency.
Deep explanation
Batch normalization normalizes layer inputs to reduce internal covariate shift. This smooths loss surfaces, allowing higher learning rates and faster convergence in Gradient Descent.
Real-world example
Training deep CNNs for image recognition faster and more stably.
Common mistakes
- Thinking BN only improves accuracy
- it also affects optimization dynamics.
Follow-up questions
- Why does BN allow higher learning rates?
- Is BN always beneficial?