How do CNNs learn spatial hierarchies and why is locality assumption critical?
Updated May 15, 2026
Short answer
CNNs learn spatial hierarchies by exploiting local connectivity, gradually building from edges to objects, relying on the assumption that nearby pixels are more correlated.
Deep explanation
CNNs assume that spatially close pixels in images are more likely to share meaningful relationships than distant ones. This locality assumption allows convolution filters to focus on small receptive fields. As depth increases, receptive fields expand, enabling higher layers to combine local features into global concepts. This hierarchical feature composition is fundamental to how CNNs achieve strong performance in vision tasks.
Unlock with a Pro subscription to view this section.
View pricingReal-world example
No real-world example available yet.
Unlock with a Pro subscription to view this section.
Upgrade to ProCommon mistakes
No common mistakes listed yet.
Unlock with a Pro subscription to view this section.
Upgrade to ProFollow-up questions
No follow-up questions available yet.
Unlock with a Pro subscription to view this section.
Upgrade to Pro