Apollo’s Unheard Voices: Graph Attention Networks for Speaker Diarization and Clustering for Fearless Steps Apollo Collection

Meena M. C. Shekar,John H. L. Hansen

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)

引用 0|浏览0
暂无评分
摘要
Speaker diarization has traditionally been explored using datasets that are either clean, feature a limited number of speakers, or have a large volume of data but lack the complexities of real-world scenarios. This study takes a unique approach by focusing on the Fearless Steps APOLLO audio resource, a challenging data that contains over 70,000 hours of audio data (A-11: 10k hrs), the majority of which remains unlabeled. This corpus presents considerable challenges such as diverse acoustic conditions, high levels of background noise, overlapping speech, data imbalance, and a variable number of speakers with varying utterance duration. To address these challenges, we propose a robust speaker diarization framework built on dynamic Graph Attention Network optimized using data augmentation. Our proposed framework attains a Diarization Error Rate (DER) of 19.6% when evaluated using ground truth speech segments. Notably, our work is the first to recognize, track, and perform conversational analysis on the entire Apollo-11 mission for speakers who were unidentified until now. This work stands as a significant contribution to both historical archiving and the development of robust diarization systems, particularly relevant for challenging real-world scenarios.
更多
查看译文
关键词
Fearless Steps APOLLO,Graph Networks,Speaker Diarization,Historical Archiving
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要