CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective
arxiv(2024)
摘要
In this paper, we present a simple yet effective contrastive knowledge
distillation approach, which can be formulated as a sample-wise alignment
problem with intra- and inter-sample constraints. Unlike traditional knowledge
distillation methods that concentrate on maximizing feature similarities or
preserving class-wise semantic correlations between teacher and student
features, our method attempts to recover the "dark knowledge" by aligning
sample-wise teacher and student logits. Specifically, our method first
minimizes logit differences within the same sample by considering their
numerical values, thus preserving intra-sample similarities. Next, we bridge
semantic disparities by leveraging dissimilarities across different samples.
Note that constraints on intra-sample similarities and inter-sample
dissimilarities can be efficiently and effectively reformulated into a
contrastive learning framework with newly designed positive and negative pairs.
The positive pair consists of the teacher's and student's logits derived from
an identical sample, while the negative pairs are formed by using logits from
different samples. With this formulation, our method benefits from the
simplicity and efficiency of contrastive learning through the optimization of
InfoNCE, yielding a run-time complexity that is far less than O(n^2), where
n represents the total number of training samples. Furthermore, our method
can eliminate the need for hyperparameter tuning, particularly related to
temperature parameters and large batch sizes. We conduct comprehensive
experiments on three datasets including CIFAR-100, ImageNet-1K, and MS COCO.
Experimental results clearly confirm the effectiveness of the proposed method
on both image classification and object detection tasks. Our source codes will
be publicly available at https://github.com/wencheng-zhu/CKD.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要