GCF-Net: Gated Clip Fusion Network for Video Action Recognition

ECCV Workshops(2020)

引用 4|浏览21
暂无评分
摘要
In recent years, most of the accuracy gains for video action recognition have come from the newly designed CNN architectures (e.g., 3D-CNNs). These models are trained by applying a deep CNN on single clip of fixed temporal length. Since each video segment are processed by the 3D-CNN module separately, the corresponding clip descriptor is local and the inter-clip relationships are inherently implicit. Common method that directly averages the clip-level outputs as a video-level prediction is prone to fail due to the lack of mechanism that can extract and integrate relevant information to represent the video.
更多
查看译文
关键词
Video action recognition,3D-CNNs,Dense slip sampling,Clip fusion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要