Videowhisper: Towards Unsupervised Learning Of Discriminative Features Of Videos With Rnn

2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME)(2017)

引用 2|浏览51
暂无评分
摘要
We present VideoWhisper, a novel approach for unsupervised video representation learning, in which video sequence is treated as a self-supervision entity based on the observation that the sequence encodes video temporal dynamics (e.g., object movement and event evolution). Specifically, for each video sequence, we use a pre-learned visual dictionary to generate a sequence of high-level semantics, dubbed "whisper", which encodes both visual contents at the frame level and visual dynamics at the sequence level. VideoWhisper is driven by a novel "sequence-to-whisper" learning strategy. Naturally, an end-to-end sequence-to-sequence learning model using RNN is modeled and trained to predict the whisper sequence. We propose two ways to generate video representation from the model. Through extensive experiments we demonstrate that video representation learned by VideoWhisper is effective to boost fundamental video-related applications such as video retrieval and classification.
更多
查看译文
关键词
Unsupervised Feature Learning, Video Features, Sequence Learning, Recurrent Neural Networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要