Distilling the Knowledge in Data Pruning
arxiv(2024)
摘要
With the increasing size of datasets used for training neural networks, data
pruning becomes an attractive field of research. However, most current data
pruning algorithms are limited in their ability to preserve accuracy compared
to models trained on the full data, especially in high pruning regimes. In this
paper we explore the application of data pruning while incorporating knowledge
distillation (KD) when training on a pruned subset. That is, rather than
relying solely on ground-truth labels, we also use the soft predictions from a
teacher network pre-trained on the complete data. By integrating KD into
training, we demonstrate significant improvement across datasets, pruning
methods, and on all pruning fractions. We first establish a theoretical
motivation for employing self-distillation to improve training on pruned data.
Then, we empirically make a compelling and highly practical observation: using
KD, simple random pruning is comparable or superior to sophisticated pruning
methods across all pruning regimes. On ImageNet for example, we achieve
superior accuracy despite training on a random subset of only 50
Additionally, we demonstrate a crucial connection between the pruning factor
and the optimal knowledge distillation weight. This helps mitigate the impact
of samples with noisy labels and low-quality images retained by typical pruning
algorithms. Finally, we make an intriguing observation: when using lower
pruning fractions, larger teachers lead to accuracy degradation, while
surprisingly, employing teachers with a smaller capacity than the student's may
improve results. Our code will be made available.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要