Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems.

Interspeech(2021)

引用 8|浏览15
暂无评分
摘要
Improving multilingual end-to-end (E2E) automatic speech recognition (ASR) systems have manifold advantages. They simplify the training strategy, are easier to scale and exhibit better performance over monolingual models. However, it is still challenging to use a single multilingual model to recognize multiple languages without knowing the input language, as most multilingual models assume the availability of the input language. In this paper, we introduce multi-softmax model to improve the multilingual recurrent neural network transducer (RNN-T) models, by having language specific softmax, joint and embedding layers, while sharing rest of the parameters. We extend the multi-softmax model to work without knowing the input language, by integrating a language identification (LID) model, that estimates the LID on-the-fly and also does the recognition at the same time. The multi-softmax model outperforms monolingual models with an average word error rate relative (WERR) reduction of 4:65% on Indian languages. Finetuning further improves the WERR reduction to 12:2%. The multisoftmax model with on-the-fly LID estimation, shows WERR reduction of 13:86% compared to the multilingual baseline.
更多
查看译文
关键词
multilingual,RNN-T,end-to-end,language identification,streaming ASR
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要