A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia
CoRR(2023)
摘要
Large language models (LLMs) have an impressive ability to draw on novel
information supplied in their context. Yet the mechanisms underlying this
contextual grounding remain unknown, especially in situations where contextual
information contradicts factual knowledge stored in the parameters, which LLMs
also excel at recalling. Favoring the contextual information is critical for
retrieval-augmented generation methods, which enrich the context with
up-to-date information, hoping that grounding can rectify outdated or noisy
stored knowledge. We present a novel method to study grounding abilities using
Fakepedia, a dataset of counterfactual texts constructed to clash with a
model's internal parametric knowledge. We benchmark various LLMs with Fakepedia
and then we conduct a causal mediation analysis, based on our Masked Grouped
Causal Tracing (MGCT), on LLM components when answering Fakepedia queries.
Within this analysis, we identify distinct computational patterns between
grounded and ungrounded responses. We finally demonstrate that distinguishing
grounded from ungrounded responses is achievable through computational analysis
alone. Our results, together with existing findings about factual recall
mechanisms, provide a coherent narrative of how grounding and factual recall
mechanisms interact within LLMs.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要