Improving relation classification effectiveness by alternate distillation

Zhaoguo Wang, Kai Li,Yuxin Ye

Appl. Intell.(2023)

引用 0|浏览0
暂无评分
摘要
With the development of neural networks, more and more complex and excellent relation classification models are constantly proposed. Although they can be compressed by some model compression methods at the cost of effectiveness, they are still insufficient to deploy on resource-constrained devices. Knowledge distillation can transfer the excellent predictive abilities of superior models to lightweight models, but the gap between models limits its effects. Due to the huge gaps between relation classification models, it is painstakingly difficult to select and train a superior teacher model to guide student models when we use knowledge distillation to get a lightweight model. Therefore, how to obtain a lightweight relation classification model with high effectiveness is still a hot research topic. In this paper, we construct an alternate distillation framework with three modules. The weight adaptive external distillation module is built based on an adaptive weighting module based on cosine similarity. The progressive internal distillation module allows the model to be its own teacher to guide its own training. Finally, a combination module based on the attention mechanism combines the above two modules. On SemEval-2010 Task 8 and WiKi80 datasets, we demonstrate the great effect of our approach on improving the relation classification effectiveness of lightweight models. Graphical Abstract The complex relation classification models compressed at the cost of effectiveness are still insufficient to deploy on resource-constrained devices. Besides, due to the significant differences between relation classification models, it is challenging to find a suitable teacher model for knowledge distillation. In this paper, we propose an alternate distillation framework (including external distillation and internal distillation) to obtain lightweight relation classification models with high effectiveness. Our approach effectively transfers the excellent predictive capability of complex models to lightweight models even when there is a significant gap between them
更多
查看译文
关键词
Relation classification,Deep neural network,Effectiveness,Knowledge distillation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要