LPR: learning point-level temporal action localization through re-training

Multim. Syst.(2023)

引用 0|浏览8
暂无评分
摘要
Point-level temporal action localization (PTAL) aims to locate action instances in untrimmed videos with only one timestamp annotation for each action instance. Existing methods adopt the localization-by-classification paradigm to locate action boundaries in the temporal class activation map (TCAM) by thresholding, also known as TCAM-based method. However, TCAM-based methods are limited by the gap between classification and localization tasks, since TCAM is generated by a classification network. To address this issue, we propose a re-training framework for the PTAL task, also known as LPR. This framework consists of two stages: pseudo-label generation and re-training. In the pseudo-label generation stage, we propose a feature embedding module based on a transformer encoder to capture global context features and optimize pseudo-labels’ quality by leveraging point-level annotations. In the re-training stage, LPR uses the above pseudo-labels as supervision to locate action instances with a temporal action localization network rather than generating TCAMs. Furthermore, to alleviate the effects of label noise in the pseudo-labels, we propose a joint learning classification module (JLCM) in the re-training stage. This module contains two classification sub-modules that simultaneously predict action categories and are guided by a jointly determined clean set for network training. The proposed framework achieves state-of-the-art localization performance on both the THUMOS’14 and BEOID datasets.
更多
查看译文
关键词
Temporal action localization,Point-level,Pseudo-label,Joint learning,Re-training
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要