USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models
ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)
摘要
We introduce a multilingual speaker change detection model (USM-SCD) that can
simultaneously detect speaker turns and perform ASR for 96 languages. This
model is adapted from a speech foundation model trained on a large quantity of
supervised and unsupervised data, demonstrating the utility of fine-tuning from
a large generic foundation model for a downstream task. We analyze the
performance of this multilingual speaker change detection model through a
series of ablation studies. We show that the USM-SCD model can achieve more
than 75
consists of data from 96 languages. On American English, the USM-SCD model can
achieve an 85.8
internal test sets, beating the previous monolingual baseline model by 21
relative. We also show that we only need to fine-tune one-quarter of the
trainable model parameters to achieve the best model performance. The USM-SCD
model exhibits state-of-the-art ASR quality compared with a strong public ASR
baseline, making it suitable to handle both tasks with negligible additional
computational cost.
更多查看译文
关键词
Speaker change detection,foundation model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要