APSL: Action-positive separation learning for unsupervised temporal action localization

INFORMATION SCIENCES(2023)

引用 2|浏览10
暂无评分
摘要
Unsupervised temporal action localization in untrimmed videos is a challenging and open issue. Existing works focus on the "clustering + localization" framework for unsupervised temporal action localization. However, it heavily relies on features used for clustering and localization, e.g., features implying potential background information would degrade the localization performance. To address this problem, we propose a novel Action-positive Separation Learning (APSL) method. APSL follows a novel "feature separation + clustering + localization" iterative procedure. First, we introduce a novel feature separation learning (FSL) module. FSL employs separation learning to identify action and background features in a video, and then refines and removes potential action-negative and background-negative features (hard-to-locate) from the identified features employing contrastive learning, thus obtaining action-positive features (easy-to-locate). Next, in "clustering" step, we apply clustering to the separated action-positive features to obtain action pseudo-labels. In "localization" step, with action pseudo-labels and action-positive features, we employ a temporal action localization module to locate action instance regions, in turn, improving the performance of clustering and FSL. The three steps learn iteratively and reinforce each other during training. Comprehensive evaluations conducted on the THUMOS'14 and ActivityNet v1.2 datasets demonstrate that our method outperforms cutting-edge weakly supervised and unsupervised methods, obtaining state-of-the-art performance.
更多
查看译文
关键词
Unsupervised temporal action localization,Clustering,Feature separation learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要