seniorCNN

How do CNNs achieve translational equivariance and why is it not full invariance?

Updated May 15, 2026

Short answer

CNNs are translation-equivariant because shifting input shifts feature maps, but pooling and architecture choices only approximate full invariance.

Deep explanation

Convolution operations ensure that if an object moves in the input image, its feature map representation shifts accordingly—this is translational equivariance. However, full invariance (same output regardless of position) is not guaranteed because CNNs still preserve spatial structure. Pooling layers and global average pooling introduce partial invariance by reducing spatial sensitivity, but complete invariance would discard useful spatial information.

Unlock with a Pro subscription to view this section.

View pricing