Heterogeneous Student Knowledge Distillation From BERT Using a Lightweight Ensemble Framework

IEEE ACCESS(2024)

引用 0|浏览0
暂无评分
摘要
Deep learning models have demonstrated their effectiveness in capturing complex relationships between input features and target outputs across many different application domains. These models, however, often come with considerable memory and computational demands, posing challenges for deployment on resource-constrained edge devices. Knowledge distillation is a prominent technique for transferring the expertise from an advanced yet heavy teacher model to a more efficient leaner student model. As ensemble methods have exhibited notable enhancements in model generalization and have achieved state-of-the-art performance in various machine learning tasks, we adopt ensemble techniques to perform knowledge distillation from BERT using multiple lightweight student models. Our approach applies lean architectural paradigms of spatial and sequential networks including LSTM, CNN and their fusion to perform data processing from distinct perspectives. Instead of using contextual word representations which require more space in natural language processing applications, we take advantage of a single static pre-trained and low-dimensional word embedding space to be shared among student models. Empirical studies are conducted on the sentiment classification problem and our model outperforms not only other existing techniques but also the teacher model.
更多
查看译文
关键词
Knowledge distillation,ensemble methods,BERT,LSTM,CNN,contextual word representations,pre-trained and low-dimensional word embedding space,sentiment classification problem
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要