An investigation of phone-based subword units for end-to-end speech recognition

INTERSPEECH(2020)

引用 35|浏览85
暂无评分
摘要
Phones and their context-dependent variants have been the standard modeling units for conventional speech recognition systems, while characters and character-based subwords are becoming increasingly popular for end-to-end recognition systems. We investigate the use of phone-based subwords, and byte pair encoding (BPE) in particular, as modeling units for end-to-end speech recognition, and develop multi-level language model-based decoding algorithms based on a pronunciation dictionary. Besides the use of the lexicon which is easily available, our system avoids the need of additional expert knowledge or processing steps from conventional systems. Experimental results show that phone-based BPEs lead to more accurate recognition systems than the character-based counterpart, and further improvement can be obtained with the newly developed one-pass beam search decoder, which efficiently combines both phone-based and character-based BPE systems. For Switchboard, our phone-based BPE system achieves 7.9%/16.1% word error rates (WER) on the Switchboard/CallHome portion of the test set while the ensemble system achieves 7.2%/15.0% WER.
更多
查看译文
关键词
end-to-end speech recognition, byte pair encoding, multi-level language model, one-pass decoding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要