What is a backbone-neck-head architecture in object detection?
Updated May 15, 2026
Short answer
It is a modular design where backbone extracts features, neck fuses them, and head makes predictions.
Deep explanation
Backbone (like ResNet) extracts features, neck (like FPN) aggregates multi-scale features, and head predicts bounding boxes and classes. This modular design improves flexibility and performance.
Real-world example
Used in YOLO and Faster R-CNN architectures.
Common mistakes
- Confusing neck with backbone responsibilities.
Follow-up questions
- Why separate these components?
- What is prediction head output?