Exploring Pre-Trained Neural Audio Representations for Audio Topic Segmentation.

ICME(2023)

引用 0|浏览4
暂无评分
摘要
Recent works have shown that audio embeddings can improve automatic topic segmentation of formats such as radio shows. In this work we expand the work in that direction by showing how and which publicly available, pre-trained neural audio embeddings can perform the task, without the need of any further fine-tuning of the audio encoders. The ranking of the encoders suggest that neural encoders pre-trained for speaker diarization and general purpose audio classification are the best suited to be used as features, beating non-neural baselines. We show that we can obtain perfect results on a newly created random dataset similar to the one used in previous work. We also show for the first time results on real-world data, proving that our method can be applied to actual radio shows with good results, but the choice of audio encoders is extremely important in order to achieve those. Finally, by releasing the datasets we used we make the contribution of providing the first (to our knowledge) publicly available, free of charge datasets for audio topic segmentation of media products.
更多
查看译文
关键词
topic segmentation, neural audio embeddings
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要