Advancing Ante-Hoc Explainable Models through Generative Adversarial Networks
CoRR(2024)
摘要
This paper presents a novel concept learning framework for enhancing model
interpretability and performance in visual classification tasks. Our approach
appends an unsupervised explanation generator to the primary classifier network
and makes use of adversarial training. During training, the explanation module
is optimized to extract visual concepts from the classifier's latent
representations, while the GAN-based module aims to discriminate images
generated from concepts, from true images. This joint training scheme enables
the model to implicitly align its internally learned concepts with
human-interpretable visual properties. Comprehensive experiments demonstrate
the robustness of our approach, while producing coherent concept activations.
We analyse the learned concepts, showing their semantic concordance with object
parts and visual attributes. We also study how perturbations in the adversarial
training protocol impact both classification and concept acquisition. In
summary, this work presents a significant step towards building inherently
interpretable deep vision models with task-aligned concept representations - a
key enabler for developing trustworthy AI for real-world perception tasks.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要