Semantic Video Transformer for Robust Action Recognition.

2023 IEEE Conference on Dependable and Secure Computing (DSC)(2023)

引用 0|浏览0
暂无评分
摘要
Video action recognition has attracted significant research attention over the past several years. Although adversarial effects and robustness in image classification models have been heavily investigated, robustness of action recognition models to natural or adversarial perturbations remain largely unexplored. Moreover, even though transformer based approaches have shown great promise on various vision tasks, they have yet to be evaluated in terms of their robustness. To this end, we propose a Semantic Video Transformer for Action Recognition (SeViTAR), which maps visual features obtained by a video transformer to a more robust visual-semantic representation. We extensively evaluate the proposed approach on the ROSE Challenge dataset, and outperform all baselines with a significant margin.
更多
查看译文
关键词
robustness,action recognition,video transformer,semantic mapping
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要