Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes
CVPR 2024(2023)
摘要
In this paper, we study multi-label atomic activity recognition. Despite the
notable progress in action recognition, it is still challenging to recognize
atomic activities due to a deficiency in a holistic understanding of both
multiple road users' motions and their contextual information. In this paper,
we introduce Action-slot, a slot attention-based approach that learns visual
action-centric representations, capturing both motion and contextual
information. Our key idea is to design action slots that are capable of paying
attention to regions where atomic activities occur, without the need for
explicit perception guidance. To further enhance slot attention, we introduce a
background slot that competes with action slots, aiding the training process in
avoiding unnecessary focus on background regions devoid of activities. Yet, the
imbalanced class distribution in the existing dataset hampers the assessment of
rare activities. To address the limitation, we collect a synthetic dataset
called TACO, which is four times larger than OATS and features a balanced
distribution of atomic activities. To validate the effectiveness of our method,
we conduct comprehensive experiments and ablation studies against various
action recognition baselines. We also show that the performance of multi-label
atomic activity recognition on real-world datasets can be improved by
pretraining representations on TACO. We will release our source code and
dataset. See the videos of visualization on the project page:
https://hcis-lab.github.io/Action-slot/
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要