SELFEXPLAIN - A Self-Explaining Architecture for Neural Text Classifiers.

EMNLP(2021)

引用 42|浏览102
暂无评分
摘要
We introduce SelfExplain, a novel self-explaining framework that explains a text classifier's predictions using phrase-based concepts. SelfExplain augments existing neural classifiers by adding (1) a globally interpretable layer that identifies the most influential concepts in the training set for a given sample and (2) a locally interpretable layer that quantifies the contribution of each local input concept by computing a relevance score relative to the predicted label. Experiments across five text-classification datasets show that SelfExplain facilitates interpretability without sacrificing performance. Most importantly, explanations from SelfExplain are perceived as more understandable, adequately justifying and trustworthy by human judges compared to existing widely-used baselines.
更多
查看译文
关键词
text,architecture,self-explaining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要