What is normalization in deep vision networks (BatchNorm vs LayerNorm)?

Updated May 15, 2026

Short answer

BatchNorm normalizes across batch dimension; LayerNorm normalizes across features.

Deep explanation

BatchNorm depends on batch statistics, making it sensitive to batch size. LayerNorm normalizes per sample and is widely used in transformers. Both stabilize training but behave differently under distribution shifts.

Real-world example

LayerNorm used in Vision Transformers.

Common mistakes

Using BatchNorm in small batch training scenarios.

Follow-up questions

When is LayerNorm preferred?
Why does BatchNorm fail in inference sometimes?

Short answer

Deep explanation

Real-world example

Common mistakes

Follow-up questions

More Computer Vision interview questions