seniorCNN

How do CNNs learn spatial hierarchies and why is locality assumption critical?

Updated May 15, 2026

Short answer

CNNs learn spatial hierarchies by exploiting local connectivity, gradually building from edges to objects, relying on the assumption that nearby pixels are more correlated.

Deep explanation

CNNs assume that spatially close pixels in images are more likely to share meaningful relationships than distant ones. This locality assumption allows convolution filters to focus on small receptive fields. As depth increases, receptive fields expand, enabling higher layers to combine local features into global concepts. This hierarchical feature composition is fundamental to how CNNs achieve strong performance in vision tasks.

Unlock with a Pro subscription to view this section.

View pricing