Fix It Where It Fails: Pronunciation Learning By Mining Error Corrections From Speech Logs
2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2015)
摘要
The pronunciation dictionary, or lexicon, is an essential component in an automatic speech recognition (ASR) system in that incorrect pronunciations cause systematic misrecognitions. It typically consists of a list of word-pronunciation pairs written by linguists, and a grapheme-to-phoneme (G2P) engine to generate pronunciations for words not in the list. The hand-generated list can never keep pace with the growing vocabulary of a live speech recognition system, and the G2P is usually of limited accuracy. This is especially true for proper names whose pronunciations may be influenced by various historical or foreign-origin factors. In this paper, we propose a language-independent approach to detect misrecognitions and their corrections from voice search logs. We learn previously unknown pronunciations from this data, and demonstrate that they significantly improve the quality of a production-quality speech recognition system.
更多查看译文
关键词
speech recognition,pronunciation learning,data extraction,logistic regression
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要