What is Neural Architecture Distillation in vision models?

Updated May 15, 2026

Short answer

Neural Architecture Distillation transfers knowledge from a large teacher model to a smaller student architecture.

Deep explanation

Unlike standard knowledge distillation that only transfers logits, architecture distillation transfers intermediate feature representations, attention maps, or even structural inductive biases. In vision models, this helps compact architectures (like MobileNet or Tiny ViTs) learn hierarchical representations similar to large models without matching their complexity.

Unlock with a Pro subscription to view this section.

View pricing