Cross-Lingual Phoneme Mapping For Language Robust Contextual Speech Recognition

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)(2018)

引用 23|浏览90
暂无评分
摘要
Standard automatic speech recognition (ASR) systems are increasingly expected to recognize foreign entities, yet doing so while preserving accuracy on native words remains a challenge. We describe a novel approach for recognizing foreign words by injecting them with appropriate pronunciations into the recognizer decoder search space on-the-fly. The pronunciations are generated by mapping pronunciations from the foreign language's lexicon to the target recognizer language's phoneme inventory. The phoneme mapping itself is learned automatically using acoustic coupling of Text-to-speech (TTS) audio and a pronunciation learning algorithm. Evaluation of our algorithm on Google Assistant use cases shows we can improve recognition of media-related queries by incorporating English entity pronunciations in French and German recognizers, with wins/losses ratios of roughly 2-3:1, without hurting recognition on general traffic.
更多
查看译文
关键词
cross-lingual, speech recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要