Query-Biased Self-Attentive Network For Query-Focused Video Summarization

IEEE TRANSACTIONS ON IMAGE PROCESSING(2020)

引用 47|浏览147
暂无评分
摘要
This paper addresses the task of query-focused video summarization, which takes user queries and long videos as inputs and generates query-focused video summaries. Compared to video summarization, which mainly concentrates on finding the most diverse and representative visual contents as a summary, the task of query-focused video summarization considers the user's intent and the semantic meaning of generated summary. In this paper, we propose a method, named query-biased self-attentive network (QSAN) to tackle this challenge. Our key idea is to utilize the semantic information from video descriptions to generate a generic summary and then to combine the information from the query to generate a query-focused summary. Specifically, we first propose a hierarchical self-attentive network to model the relative relationship at three levels, which are different frames from a segment, different segments of the same video, textual information of video description and its related visual contents. We train the model on video caption dataset and employ a reinforced caption generator to generate a video description, which can help us locate important frames or shots. Then we build a query-aware scoring module to compute the query-relevant score for each shot and generate the query-focused summary. Extensive experiments on the benchmark dataset demonstrate the competitive performance of our approach compared to some methods.
更多
查看译文
关键词
Task analysis, Semantics, Visualization, Computational modeling, Generators, Benchmark testing, Instruments, Video summarization, vision and language, self-attention mechanism
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要