What is batch normalization and its relation to Gradient Descent?

Updated May 16, 2026

Short answer

Batch normalization stabilizes inputs to layers, improving Gradient Descent efficiency.

Deep explanation

Batch normalization normalizes layer inputs to reduce internal covariate shift. This smooths loss surfaces, allowing higher learning rates and faster convergence in Gradient Descent.

Real-world example

Training deep CNNs for image recognition faster and more stably.

Common mistakes

Thinking BN only improves accuracy
it also affects optimization dynamics.

Follow-up questions

Why does BN allow higher learning rates?
Is BN always beneficial?

Short answer

Deep explanation

Real-world example

Common mistakes

Follow-up questions

More Gradient Descent interview questions