Multilingual Second-Pass Rescoring for Automatic Speech Recognition Systems

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2022)

引用 1|浏览16
Second-pass rescoring is a well known technique to improve the performance of Automatic Speech Recognition (ASR) systems. Neural Oracle Search (NOS), which selects the most likely hypothesis from an N-best hypothesis list by integrating information from multiple sources, such as the input acoustic representations, N-best hypotheses, additional first-pass statistics, and unpaired textual information through an external language model, has shown success in rescoring for RNN-T first-pass models. Multilingual first-pass speech recognition models often outperform their monolingual counterparts when trained on related or low-resource languages. In this paper, we investigate the use of the NOS rescoring model on a first-pass multilingual model and show that similar to the first-pass model, the rescoring model can be made multilingual. Our first-pass multilingual model does not require a language-id and we make a realistic assumption that an estimate of the language-id would be available for second-pass rescoring. We conduct comprehensive experiments on two sets of languages, one consisting of related low-resource languages, and the other with a high-resource language added to the first set to analyze the performance of the multilingual NOS rescorer under different settings. Our experimental results show that, multilingual NOS can improve the first-pass multilingual model resulting in average word error rate reduction of 9.4% in the first case, and 8.4% in the second, and out-performing the monolingual counterparts in both cases.
speech recognition,multilingual,RNN-T,N-best rescoring
AI 理解论文