Tvsum: Summarizing Web Videos Using Titles

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)(2015)

引用 648|浏览132
暂无评分
摘要
Video summarization is a challenging problem in part because knowing which part of a video is important requires prior knowledge about its main topic. We present TVSum, an unsupervised video summarization framework that uses title-based image search results to find visually important shots. We observe that a video title is often carefully chosen to be maximally descriptive of its main topic, and hence images related to the title can serve as a proxy for important visual concepts of the main topic. However, because titles are free-formed, unconstrained, and often written ambiguously, images searched using the title can contain noise (images irrelevant to video content) and variance (images of different topics). To deal with this challenge, we developed a novel co-archetypal analysis technique that learns canonical visual concepts shared between video and images, but not in either alone, by finding a joint-factorial representation of two data sets. We introduce a new benchmark dataset, TVSum50, that contains 50 videos and their shot level importance scores annotated via crowdsourcing. Experimental results on two datasets, SumMe and TVSum50, suggest our approach produces superior quality summaries compared to several recently proposed approaches.
更多
查看译文
关键词
unsupervised Web video summarization,title-based image search results,coarchetypal analysis technique,canonical visual concepts,joint-factorial dataset representation,TVSum50 dataset
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要