InterpretCC: Conditional Computation for Inherently Interpretable Neural Networks
CoRR(2024)
摘要
Real-world interpretability for neural networks is a tradeoff between three
concerns: 1) it requires humans to trust the explanation approximation (e.g.
post-hoc approaches), 2) it compromises the understandability of the
explanation (e.g. automatically identified feature masks), and 3) it
compromises the model performance (e.g. decision trees). These shortcomings are
unacceptable for human-facing domains, like education, healthcare, or natural
language, which require trustworthy explanations, actionable interpretations,
and accurate predictions. In this work, we present InterpretCC (interpretable
conditional computation), a family of interpretable-by-design neural networks
that guarantee human-centric interpretability while maintaining comparable
performance to state-of-the-art models by adaptively and sparsely activating
features before prediction. We extend this idea into an interpretable
mixture-of-experts model, that allows humans to specify topics of interest,
discretely separates the feature space for each data point into topical
subnetworks, and adaptively and sparsely activates these topical subnetworks.
We demonstrate variations of the InterpretCC architecture for text and tabular
data across several real-world benchmarks: six online education courses, news
classification, breast cancer diagnosis, and review sentiment.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要