A Knowledge Augmented and Multimodal-Based Framework for Video Summarization

International Multimedia Conference(2022)

引用 4|浏览12
暂无评分
摘要
ABSTRACTVideo summarization aims to generate a compact version of a lengthy video that retains its primary content. In general, humans are gifted with producing a high-quality video summary, because they acquire crucial content through multiple dimensional information and own abundant background knowledge about the original video. However, existing methods rarely consider multichannel information and ignore the impact of external knowledge, resulting in the limited quality of the generated summaries. This paper proposes a knowledge augmented and multimodal-based video summarization method, termed KAMV, to address the problem above. Specifically, we design a knowledge encoder with a hybrid method consisting of generation and retrieval, to capture descriptive content and latent connections between events and entities based on the external knowledge base, which can provide rich implicit knowledge for better comprehending the video viewed. Furthermore, for the sake of exploring the interactions among visual, audio, implicit knowledge and emphasizing the content that is most relevant to the desired summary, we present a fusion module under the supervision of these multimodal information. By conducting extensive experiments on four public datasets, the results demonstrate the superior performance yielded by the proposed KAMV compared to the state-of-the-art video summarization approaches.
更多
查看译文
关键词
knowledge augmented,video,multimodal-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要