An Empirical Study on Adaptive Inference for Pretrained Language Model.

IEEE transactions on neural networks and learning systems(2023)

引用 2|浏览26
暂无评分
摘要
Adaptive inference has been proven to improve bidirectional encoder representations from transformers (BERT)'s inference speed with minimal loss of accuracy. However, current work only focuses on the BERT model and lacks exploration of other pretrained language models (PLMs). Therefore, this article conducts an empirical study on the application of adaptive inference mechanism in various PLMs, including generative pretraining (GPT), GCNN, ALBERT, and TinyBERT. This mechanism is verified on both English and Chinese benchmarks, and experimental results demonstrated that it is able to speed up by a wide range from 1 to 10 times if given different speed thresholds. In addition, its application on ALBERT shows that adaptive inference can work with parameter sharing, achieving model compression and acceleration simultaneously, while the application on TinyBERT proves that it can further accelerate the distilled small model. As for the problem that too many labels make adaptive inference invalid, this article also proposes a solution, namely label reduction. Finally, this article open-sources an easy-to-use toolkit called FastPLM to help developers adopt pretrained models with adaptive inference capabilities in their applications.
更多
查看译文
关键词
Adaptation models,Bit error rate,Task analysis,Inference mechanisms,Transformers,Computational modeling,Mathematical models,Adaptive inference,bidirectional encoder representations from transformers (BERT),distillation,FastPLM,pretrained language model (PLM)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要