Delving into Details: Synopsis-to-Detail Networks for Video Recognition.

European Conference on Computer Vision(2022)

引用 1|浏览37
暂无评分
摘要
In this paper, we explore the details in video recognition with the aim to improve the accuracy. It is observed that most failure cases in recent works fall on the mis-classifications among very similar actions (such as high kick vs. side kick) that need a capturing of fine-grained discriminative details. To solve this problem, we propose synopsis-to-detail networks for video action recognition. Firstly, a synopsis network is introduced to predict the top-k likely actions and generate the synopsis (location & scale of details and contextual features). Secondly, according to the synopsis, a detail network is applied to extract the discriminative details in the input and infer the final action prediction. The proposed synopsis-to-detail networks enable us to train models directly from scratch in an end-to-end manner and to investigate various architectures for synopsis/detail recognition. Extensive experiments on benchmark datasets, including Kinetics-400, Mini-Kinetics and Something-Something V1 & V2, show that our method is more effective and efficient than the competitive baselines. Code is available at: https://github.com/liang4sx/S2DNet.
更多
查看译文
关键词
Video recognition,Action recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要