SWAG-V: Explanations for Video using Superpixels Weighted by Average Gradients

2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022)(2022)

引用 4|浏览6
暂无评分
摘要
CNN architectures that take videos as an input are often overlooked when it comes to the development of explanation techniques. This is despite their use in often critical domains such as surveillance and healthcare. Explanation techniques developed for these networks must take into account the additional temporal domain if they are to be successful. In this paper we introduce SWAG-V, an extension of SWAG for use with networks that take video as an input. In addition we show how these explanations can be created in such a way that they are balanced between fine and coarse explanations. By creating superpixels that incorporate the frames of the input video we are able to create explanations that better locate regions of the input that are important to the networks prediction. We compare SWAG-V against a number of similar techniques using metrics such as insertion and deletion, and weak localisation. We compute these using Kinetics-400 with both the C3D and R(2+1)D network architectures and find that SWAG-V is able to outperform multiple techniques.
更多
查看译文
关键词
Explainable AI,Fairness,Accountability,Privacy and Ethics in Vision Action and Behavior Recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要