KIWI: A Dataset of Knowledge-Intensive Writing Instructions for Answering Research Questions
arxiv(2024)
摘要
Large language models (LLMs) adapted to follow user instructions are now
widely deployed as conversational agents. In this work, we examine one
increasingly common instruction-following task: providing writing assistance to
compose a long-form answer. To evaluate the capabilities of current LLMs on
this task, we construct KIWI, a dataset of knowledge-intensive writing
instructions in the scientific domain. Given a research question, an initial
model-generated answer and a set of relevant papers, an expert annotator
iteratively issues instructions for the model to revise and improve its answer.
We collect 1,260 interaction turns from 234 interaction sessions with three
state-of-the-art LLMs. Each turn includes a user instruction, a model response,
and a human evaluation of the model response. Through a detailed analysis of
the collected responses, we find that all models struggle to incorporate new
information into an existing answer, and to perform precise and unambiguous
edits. Further, we find that models struggle to judge whether their outputs
successfully followed user instructions, with accuracy at least 10 points short
of human agreement. Our findings indicate that KIWI will be a valuable resource
to measure progress and improve LLMs' instruction-following capabilities for
knowledge intensive writing tasks.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要