Energy-Based Concept Bottleneck Models: Unifying Prediction, Concept Intervention, and Probabilistic Interpretations
ICLR 2024(2024)
摘要
Existing methods, such as concept bottleneck models (CBMs), have been
successful in providing concept-based interpretations for black-box deep
learning models. They typically work by predicting concepts given the input and
then predicting the final class label given the predicted concepts. However,
(1) they often fail to capture the high-order, nonlinear interaction between
concepts, e.g., correcting a predicted concept (e.g., "yellow breast") does not
help correct highly correlated concepts (e.g., "yellow belly"), leading to
suboptimal final accuracy; (2) they cannot naturally quantify the complex
conditional dependencies between different concepts and class labels (e.g., for
an image with the class label "Kentucky Warbler" and a concept "black bill",
what is the probability that the model correctly predicts another concept
"black crown"), therefore failing to provide deeper insight into how a
black-box model works. In response to these limitations, we propose
Energy-based Concept Bottleneck Models (ECBMs). Our ECBMs use a set of neural
networks to define the joint energy of candidate (input, concept, class)
tuples. With such a unified interface, prediction, concept correction, and
conditional dependency quantification are then represented as conditional
probabilities, which are generated by composing different energy functions. Our
ECBMs address both limitations of existing CBMs, providing higher accuracy and
richer concept interpretations. Empirical results show that our approach
outperforms the state-of-the-art on real-world datasets.
更多查看译文
关键词
Interpretability,Concepts,Energy-Based Model,Probabilistic Methods
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要