Video Summarization With Anchors And Multi-Head Attention

2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)(2020)

引用 4|浏览28
暂无评分
摘要
Video summarization is a challenging task that will automatically generate a representative and attractive highlight movie from the source video. Previous works explicitly exploit the hierarchical structure of video to train a summarizer. However, their method sometimes uses fixed-length segmentation, which breaks the video structure or requires additional training data to train the segmentation model. In this paper, we propose an Anchor-Based Attention RNN (ABA-RNN) for solving the video summarization problem. ABA-RNN provides two contributions. One is that we attain the frame-level and clip-level features by the anchor-based approach, and the model only needs one layer of RNN by introducing subtraction manner used in minus-LSTM. We also use multi-head attention to let the model select suitable lengths of segments. Another contribution is that we do not need any extra video preprocessing to determine shot boundaries and our architecture is end-to-end training. In experiments, we follow the standard datasets SumMe and TVSum and achieve competitive performance against the state-of-the-art results.
更多
查看译文
关键词
Video summarization, multi-head attention, anchors, deep learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要