Synthetic Conversations Improve Multi-Talker ASR

Thai-Binh Nguyen,Alexander Waibel

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)

引用 0|浏览1
暂无评分
摘要
In recent times, automatic speech recognition (ASR) has seen remarkable progress, particularly in recognizing dominant speakers. Nevertheless, the challenge of multi-talker scenarios involving distinguishing between speakers and transcribing their speech accurately remains unsolved due to limited data constraining model effectiveness. In this study, We propose a novel methodology called Systematic Synthetic Conversations (SSC), which leverages conventional ASR datasets to help an end-to-end (E2E) multi-talker ASR model establish new state-of-the-art results across synthetic and authentic multi-talker datasets. Notably, we achieved a 3.47% word error rate (WER) for the Libri2Mix [1] set, and WERs of 13.96% and 19.51% for the AMI-IHM and AMI-SDM [2] sets, respectively. These outcomes underscore the hidden potential of existing resources in tackling the complicated multi-talker problems within the domain of ASR.
更多
查看译文
关键词
multi-talker,asr,synthetic conversation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要