End-to-End Joint Modeling of Conversation History-Dependent and Independent ASR Systems with Multi-History Training

Conference of the International Speech Communication Association (INTERSPEECH)(2022)

引用 0|浏览11
暂无评分
摘要
This paper proposes end-to-end joint modeling of conversation history-dependent and independent automatic speech recognition (ASR) systems. Conversation histories are available in ASR systems such as meeting transcription applications but not available in those such as voice search applications. So far, these two ASR systems have been individually constructed using different models, but this is inefficient for each application. In fact, conventional conversation history-dependent ASR systems can perform both history-dependent and independent processing. However, their performance is inferior to history-independent ASR systems. This is because the model architecture and its training criterion in the conventional conversation history-dependent ASR systems are specialized in the case where conversational histories are available. To address this problem, our proposed end-to-end joint modeling method uses a crossmodal transformer-based architecture that can flexibly switch to use the conversation histories or not. In addition, we propose multi-history training that simultaneously utilizes a dataset without histories and datasets with various histories to effectively improve both types of ASR processing by introducing unified architecture. Experiments on Japanese ASR tasks demonstrate the effectiveness of the proposed method.
更多
查看译文
关键词
end-to-end speech recognition, conversation-history, crossmodal transformer, multi-history training
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要