Synthetic Conversations Improve Multi-Talker ASR

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)（2024）

引用 0|浏览1

暂无评分

摘要

In recent times, automatic speech recognition (ASR) has seen remarkable progress, particularly in recognizing dominant speakers. Nevertheless, the challenge of multi-talker scenarios involving distinguishing between speakers and transcribing their speech accurately remains unsolved due to limited data constraining model effectiveness. In this study, We propose a novel methodology called Systematic Synthetic Conversations (SSC), which leverages conventional ASR datasets to help an end-to-end (E2E) multi-talker ASR model establish new state-of-the-art results across synthetic and authentic multi-talker datasets. Notably, we achieved a 3.47% word error rate (WER) for the Libri2Mix [1] set, and WERs of 13.96% and 19.51% for the AMI-IHM and AMI-SDM [2] sets, respectively. These outcomes underscore the hidden potential of existing resources in tackling the complicated multi-talker problems within the domain of ASR.

查看译文

关键词

multi-talker,asr,synthetic conversation

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要