Cross-Lingual Phoneme Mapping For Language Robust Contextual Speech Recognition

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)(2018)

Cited 23|Views101
No score
Abstract
Standard automatic speech recognition (ASR) systems are increasingly expected to recognize foreign entities, yet doing so while preserving accuracy on native words remains a challenge. We describe a novel approach for recognizing foreign words by injecting them with appropriate pronunciations into the recognizer decoder search space on-the-fly. The pronunciations are generated by mapping pronunciations from the foreign language's lexicon to the target recognizer language's phoneme inventory. The phoneme mapping itself is learned automatically using acoustic coupling of Text-to-speech (TTS) audio and a pronunciation learning algorithm. Evaluation of our algorithm on Google Assistant use cases shows we can improve recognition of media-related queries by incorporating English entity pronunciations in French and German recognizers, with wins/losses ratios of roughly 2-3:1, without hurting recognition on general traffic.
More
Translated text
Key words
cross-lingual, speech recognition
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined