On Good Practices for Task-Specific Distillation of Large Pretrained Models
CoRR(2024)
摘要
Large pretrained visual models exhibit remarkable generalization across
diverse recognition tasks. Yet, real-world applications often demand compact
models tailored to specific problems. Variants of knowledge distillation have
been devised for such a purpose, enabling task-specific compact models (the
students) to learn from a generic large pretrained one (the teacher). In this
paper, we show that the excellent robustness and versatility of recent
pretrained models challenge common practices established in the literature,
calling for a new set of optimal guidelines for task-specific distillation. To
address the lack of samples in downstream tasks, we also show that a variant of
Mixup based on stable diffusion complements standard data augmentation. This
strategy eliminates the need for engineered text prompts and improves
distillation of generic models into streamlined specialized networks.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要