Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models
arxiv(2024)
摘要
Hallucinations in large language models (LLMs) refer to the phenomenon of
LLMs producing responses that are coherent yet factually inaccurate. This issue
undermines the effectiveness of LLMs in practical applications, necessitating
research into detecting and mitigating hallucinations of LLMs. Previous studies
have mainly concentrated on post-processing techniques for hallucination
detection, which tend to be computationally intensive and limited in
effectiveness due to their separation from the LLM's inference process. To
overcome these limitations, we introduce MIND, an unsupervised training
framework that leverages the internal states of LLMs for real-time
hallucination detection without requiring manual annotations. Additionally, we
present HELM, a new benchmark for evaluating hallucination detection across
multiple LLMs, featuring diverse LLM outputs and the internal states of LLMs
during their inference process. Our experiments demonstrate that MIND
outperforms existing state-of-the-art methods in hallucination detection.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要