Evaluating the Stability of Semantic Concept Representations in CNNs for Robust Explainability
arxiv(2023)
摘要
Analysis of how semantic concepts are represented within Convolutional Neural
Networks (CNNs) is a widely used approach in Explainable Artificial
Intelligence (XAI) for interpreting CNNs. A motivation is the need for
transparency in safety-critical AI-based systems, as mandated in various
domains like automated driving. However, to use the concept representations for
safety-relevant purposes, like inspection or error retrieval, these must be of
high quality and, in particular, stable. This paper focuses on two stability
goals when working with concept representations in computer vision CNNs:
stability of concept retrieval and of concept attribution. The guiding use-case
is a post-hoc explainability framework for object detection (OD) CNNs, towards
which existing concept analysis (CA) methods are successfully adapted. To
address concept retrieval stability, we propose a novel metric that considers
both concept separation and consistency, and is agnostic to layer and concept
representation dimensionality. We then investigate impacts of concept
abstraction level, number of concept training samples, CNN size, and concept
representation dimensionality on stability. For concept attribution stability
we explore the effect of gradient instability on gradient-based explainability
methods. The results on various CNNs for classification and object detection
yield the main findings that (1) the stability of concept retrieval can be
enhanced through dimensionality reduction via data aggregation, and (2) in
shallow layers where gradient instability is more pronounced, gradient
smoothing techniques are advised. Finally, our approach provides valuable
insights into selecting the appropriate layer and concept representation
dimensionality, paving the way towards CA in safety-critical XAI applications.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要