What is normalization in deep vision networks (BatchNorm vs LayerNorm)?

Updated May 15, 2026

Short answer

BatchNorm normalizes across batch dimension; LayerNorm normalizes across features.

Deep explanation

BatchNorm depends on batch statistics, making it sensitive to batch size. LayerNorm normalizes per sample and is widely used in transformers. Both stabilize training but behave differently under distribution shifts.

Real-world example

LayerNorm used in Vision Transformers.

Common mistakes

  • Using BatchNorm in small batch training scenarios.

Follow-up questions

  • When is LayerNorm preferred?
  • Why does BatchNorm fail in inference sometimes?

More Computer Vision interview questions

View all →