DMT: Comprehensive Distillation with Multiple Self-supervised Teachers
CoRR(2023)
摘要
Numerous self-supervised learning paradigms, such as contrastive learning and
masked image modeling, have been proposed to acquire powerful and general
representations from unlabeled data. However, these models are commonly
pretrained within their specific framework alone, failing to consider the
complementary nature of visual representations. To tackle this issue, we
introduce Comprehensive Distillation with Multiple Self-supervised Teachers
(DMT) for pretrained model compression, which leverages the strengths of
multiple off-the-shelf self-supervised models. Our experimental results on
prominent benchmark datasets exhibit that the proposed method significantly
surpasses state-of-the-art competitors while retaining favorable efficiency
metrics. On classification tasks, our DMT framework utilizing three different
self-supervised ViT-Base teachers enhances the performance of both small/tiny
models and the base model itself. For dense tasks, DMT elevates the AP/mIoU of
standard SSL models on MS-COCO and ADE20K datasets by 4.0%.
更多查看译文
关键词
Distillation,Self-supervised Learning,Multiply Teachers
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要