Estimation of Concept Explanations Should be Uncertainty Aware
arxiv(2023)
摘要
Model explanations can be valuable for interpreting and debugging predictive
models. We study a specific kind called Concept Explanations, where the goal is
to interpret a model using human-understandable concepts. Although popular for
their easy interpretation, concept explanations are known to be noisy. We begin
our work by identifying various sources of uncertainty in the estimation
pipeline that lead to such noise. We then propose an uncertainty-aware Bayesian
estimation method to address these issues, which readily improved the quality
of explanations. We demonstrate with theoretical analysis and empirical
evaluation that explanations computed by our method are robust to train-time
choices while also being label-efficient. Further, our method proved capable of
recovering relevant concepts amongst a bank of thousands, in an evaluation with
real-datasets and off-the-shelf models, demonstrating its scalability. We
believe the improved quality of uncertainty-aware concept explanations make
them a strong candidate for more reliable model interpretation. We release our
code at https://github.com/vps-anonconfs/uace.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要