GCF-Net: Gated Clip Fusion Network for Video Action Recognition

ECCV Workshops（2020）

引用 4|浏览21

暂无评分

摘要

In recent years, most of the accuracy gains for video action recognition have come from the newly designed CNN architectures (e.g., 3D-CNNs). These models are trained by applying a deep CNN on single clip of fixed temporal length. Since each video segment are processed by the 3D-CNN module separately, the corresponding clip descriptor is local and the inter-clip relationships are inherently implicit. Common method that directly averages the clip-level outputs as a video-level prediction is prone to fail due to the lack of mechanism that can extract and integrate relevant information to represent the video.

查看译文

关键词

Video action recognition,3D-CNNs,Dense slip sampling,Clip fusion

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要