Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents
CoRR(2024)
摘要
Large language models (LLMs) have achieved success in acting as agents, which
interact with environments through tools like search engines. However, LLMs are
not optimized specifically for tool use during training or alignment, limiting
their effectiveness as agents. To resolve this problem, previous work has
collected interaction trajectories between GPT-4 and environments, and
fine-tuned smaller models with them. As part of this, the standard approach has
been to simply discard trajectories that do not finish the task successfully,
which, on the one hand, leads to a significant waste of data and resources, and
on the other hand, has the potential to limit the possible optimization paths
during fine-tuning. In this paper, we contend that large language models can
learn from failures through appropriate data cleaning and fine-tuning
strategies. We conduct experiments on mathematical reasoning, multi-hop
question answering, and strategic question answering tasks. Experimental
results demonstrate that compared to solely using positive examples,
incorporating negative examples enhances model performance by a large margin.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要