Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Non-Literal Intent Resolution in LLMs
arxiv(2024)
摘要
Humans often express their communicative intents indirectly or non-literally,
which requires their interlocutors – human or AI – to understand beyond the
literal meaning of words. While most existing work has focused on
discriminative evaluations, we present a new approach to generatively evaluate
large language models' (LLMs') intention understanding by examining their
responses to non-literal utterances. Ideally, an LLM should respond in line
with the true intention of a non-literal utterance, not its literal
interpretation. Our findings show that LLMs struggle to generate pragmatically
relevant responses to non-literal language, achieving only 50-55
average. While explicitly providing oracle intentions significantly improves
performance (e.g., 75
in leveraging given intentions to produce appropriate responses. Using
chain-of-thought to make models spell out intentions yields much smaller gains
(60
effective pragmatic interlocutors, highlighting the need for better approaches
for modeling intentions and utilizing them for pragmatic generation.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要