What is batch normalization and its relation to Gradient Descent?

Updated May 16, 2026

Short answer

Batch normalization stabilizes inputs to layers, improving Gradient Descent efficiency.

Deep explanation

Batch normalization normalizes layer inputs to reduce internal covariate shift. This smooths loss surfaces, allowing higher learning rates and faster convergence in Gradient Descent.

Real-world example

Training deep CNNs for image recognition faster and more stably.

Common mistakes

  • Thinking BN only improves accuracy
  • it also affects optimization dynamics.

Follow-up questions

  • Why does BN allow higher learning rates?
  • Is BN always beneficial?

More Gradient Descent interview questions

View all →