Temporal action detection with dynamic weights based on curriculum learning

Neurocomputing(2023)

引用 0|浏览17
暂无评分
摘要
To enable temporal action localization, the computer needs to recognize the locations and classes of action instances in a video. The main challenge to temporal action detection is that the videos are often long and untrimmed, consisting of varying action content. Existing temporal action detection frameworks exhibit a gap between the training and testing phases, which is detrimental to model performance. Specifically, all positive samples are trained identically in the training phase. By contrast, in the testing phase, the positive samples with the best classification and localization scores are selected, while all others are suppressed. To mitigate this issue, we build an auxiliary branch to unify the training and testing procedures. In the construction of the auxiliary branch, we design a dynamic weighting strategy based on curriculum learning, where the weights of training samples are a combination of their classification and localization scores. Motivated by the speculation of curriculum learning, we emphasize the importance of classification and localization scores in different training stages. The classification score accounts for a higher proportion of the combined score in the early stages of the training process. As the epoch increases, the localization score gradually increases in proportion as well. The experimental results demonstrate that our methodology of curriculum-based learning enhances the performance of current action localization techniques. On THUMOS14, our technique outperforms the existing state-of-the-art technique (57.6% vs 55.5%). And the performance on ActivityNet v1.3 (mAP@Avg) reaches 35.4%.
更多
查看译文
关键词
Video understanding,Video classification,Action segmentation in videos
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要