LLMCRIT: Teaching Large Language Models to Use Criteria
arxiv(2024)
摘要
Humans follow criteria when they execute tasks, and these criteria are
directly used to assess the quality of task completion. Therefore, having
models learn to use criteria to provide feedback can help humans or models to
perform tasks better. However, existing research in this field tends to
consider only a limited set of criteria or quality assessment aspects. To fill
this gap, we propose a general framework that enables large language models
(LLMs) to use comprehensive criteria for a task in delivering natural language
feedback on task execution. In particular, we present a model-in-the-loop
framework that semi-automatically derives criteria from collected guidelines
for different writing tasks and constructs in-context demonstrations for each
criterion. We choose three tasks from real-world scenarios to operationalize
this idea: paper introduction writing, Python code writing, and Reddit post
writing, and evaluate our feedback generation framework using different LLMs.
The results reveal the fine-grained effects of incorporating criteria and
demonstrations and provide valuable insights on how to teach LLMs to use
criteria more effectively.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要