Fix It Where It Fails: Pronunciation Learning By Mining Error Corrections From Speech Logs

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2015)

引用 13|浏览100
暂无评分
摘要
The pronunciation dictionary, or lexicon, is an essential component in an automatic speech recognition (ASR) system in that incorrect pronunciations cause systematic misrecognitions. It typically consists of a list of word-pronunciation pairs written by linguists, and a grapheme-to-phoneme (G2P) engine to generate pronunciations for words not in the list. The hand-generated list can never keep pace with the growing vocabulary of a live speech recognition system, and the G2P is usually of limited accuracy. This is especially true for proper names whose pronunciations may be influenced by various historical or foreign-origin factors. In this paper, we propose a language-independent approach to detect misrecognitions and their corrections from voice search logs. We learn previously unknown pronunciations from this data, and demonstrate that they significantly improve the quality of a production-quality speech recognition system.
更多
查看译文
关键词
speech recognition,pronunciation learning,data extraction,logistic regression
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要