Multimodal Topic Segmentation of Podcast Shows with Pre-trained Neural Encoders

ICMR '23: Proceedings of the 2023 ACM International Conference on Multimedia Retrieval(2023)

引用 0|浏览0
暂无评分
摘要
We present two multimodal models for topic segmentation of podcasts built on pre-trained neural text and audio embeddings. We show that results can be improved by combining different modalities; but also by combining different encoders from the same modality, especially general-purpose sentence embeddings with specifically fine-tuned ones. We also show that audio embeddings can be substituted with two simple features related to sentence duration and inter-sentential pauses with comparable results. Finally, we publicly release our two datasets, the first in our knowledge publicly and freely available multimodal datasets for topic segmentation.
更多
查看译文
关键词
topic segmentation, multi-modal
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要