Large Language Models As Faithful Explainers
CoRR(2024)
摘要
Large Language Models (LLMs) have recently become proficient in addressing
complex tasks by utilizing their rich internal knowledge and reasoning ability.
Consequently, this complexity hinders traditional input-focused explanation
algorithms for explaining the complex decision-making processes of LLMs. Recent
advancements have thus emerged for self-explaining their predictions through a
single feed-forward inference in a natural language format. However, natural
language explanations are often criticized for lack of faithfulness since these
explanations may not accurately reflect the decision-making behaviors of the
LLMs. In this work, we introduce a generative explanation framework, xLLM, to
improve the faithfulness of the explanations provided in natural language
formats for LLMs. Specifically, we propose an evaluator to quantify the
faithfulness of natural language explanation and enhance the faithfulness by an
iterative optimization process of xLLM, with the goal of maximizing the
faithfulness scores. Experiments conducted on three NLU datasets demonstrate
that xLLM can significantly improve the faithfulness of generated explanations,
which are in alignment with the behaviors of LLMs.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要