Cluster Encoding For Modelling Temporal Variation In Video

2015 IEEE International Conference on Image Processing (ICIP)(2015)

引用 1|浏览68
暂无评分
摘要
Classical Bag-of-Words methods represent videos by modeling the variation of local visual descriptors throughout the video. In this approach they mix variation in time and space indiscriminately while these dimensions are fundamentally different. Therefore, in this paper we present a novel method for video representation which explicitly captures temporal variation over time. We do this by first creating frame-based features using standard Bag-of-Words techniques. To model the variation in time over these frame-based features, we introduce Hard and Soft Cluster Encoding, novel techniques to model variation inspired by the Fisher Kernel [1] and VLAD [2]. Results on the Rochester ADL [3] and Blip10k [4] datasets show that our method yields improvements of respectively 6.6% and 7.4% over our baselines. On Blip10k we outperform the state-of-the-art by 3.6% when using only visual features.
更多
查看译文
关键词
modeling temporal variation in video,temporal Fisher Kernel encoding,temporal VLAD encoding,video classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要