CoEvo-Net: Coevolution Network for Video Highlight Detection

IEEE Transactions on Circuits and Systems for Video Technology(2022)

引用 1|浏览40
暂无评分
摘要
Video highlight detection (VHD) has emerged as a pressing task due to the unprecedentedly increasing amount of video data, such as those from e-commerce live-broadcasting platforms. Many approaches focus on exploiting text data, in the form of video description or time-sync comments, to facilitate the VHD task. Despite the promising results, they have largely overlooked the noises inherent in the text data and have mostly relied on isolating the feature of text and video. In this paper, we introduce a novel model to handle VHD, termed Coevolution Network (CoEvo-Net), that allows us to account for joint learning of the language and video features explicitly via a coevolution paradigm, in which features from the two data modalities progressively refine each other. This is achieved by a dedicated CoEvo-Cell that takes language and video together as inputs, extracts cross-modality, and filters the undesired parts of the input, such as words in a sentence. Furthermore, we release a large-scale dataset of e-commerce for VHD, in which each video is coupled with a sentence for description, to benchmark the sentence-based VHD approaches. Extensive experiments on the released dataset demonstrate that CoEvo-Net achieves state-of-the-art performance. Our dataset and code will be made publicly available.
更多
查看译文
关键词
Video highlight detection,video analysis,multi-modality
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要