Measuring Moral Inconsistencies in Large Language Models
CoRR(2024)
摘要
A Large Language Model (LLM) is considered consistent if semantically
equivalent prompts produce semantically equivalent responses. Despite recent
advancements showcasing the impressive capabilities of LLMs in conversational
systems, we show that even state-of-the-art LLMs are highly inconsistent in
their generations, questioning their reliability. Prior research has tried to
measure this with task-specific accuracies. However, this approach is
unsuitable for moral scenarios, such as the trolley problem, with no
“correct” answer. To address this issue, we propose a novel
information-theoretic measure called Semantic Graph Entropy (SGE) to measure
the consistency of an LLM in moral scenarios. We leverage “Rules of
Thumb” (RoTs) to explain a model's decision-making strategies and further
enhance our metric. Compared to existing consistency metrics, SGE correlates
better with human judgments across five LLMs. In the future, we aim to
investigate the root causes of LLM inconsistencies and propose improvements.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要