Component Fusion: Learning Replaceable Language Model Component For End-To-End Speech Recognition System

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)(2019)

引用 94|浏览119
暂无评分
摘要
Recently, attention-based end-to-end automatic speech recognition system (ASR) has shown promising results. One of the limitations of an attention-based ASR system is that its language model (LM) component has to be implicitly learned from transcribed speech data which prevents one from utilizing plenty of text corpora to improve language modeling. In this work, the Component Fusion method is proposed to incorporate externally trained neural network (NN) LM into an attention-based ASR system. During training stage we equip the attention-based system with an additional LM component which is replaced by an externally trained NN LM at decoding stage. Experimental results show that the proposed Component Fusion outperforms two prior LM fusion approaches, i.e., Shallow Fusion and Cold Fusion, in both out-of-domain and in-domain scenarios. Further improvements can be achieved when combining Component and Shallow Fusion.
更多
查看译文
关键词
automatic speech recognition, end-to-end speech recognition, attention-based model, language model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要