Synthetic Conversations Improve Multi-Talker ASR
ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)
摘要
In recent times, automatic speech recognition (ASR) has seen remarkable progress, particularly in recognizing dominant speakers. Nevertheless, the challenge of multi-talker scenarios involving distinguishing between speakers and transcribing their speech accurately remains unsolved due to limited data constraining model effectiveness. In this study, We propose a novel methodology called Systematic Synthetic Conversations (SSC), which leverages conventional ASR datasets to help an end-to-end (E2E) multi-talker ASR model establish new state-of-the-art results across synthetic and authentic multi-talker datasets. Notably, we achieved a 3.47% word error rate (WER) for the Libri2Mix [1] set, and WERs of 13.96% and 19.51% for the AMI-IHM and AMI-SDM [2] sets, respectively. These outcomes underscore the hidden potential of existing resources in tackling the complicated multi-talker problems within the domain of ASR.
更多查看译文
关键词
multi-talker,asr,synthetic conversation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要