Small LLMs Are Weak Tool Learners: A Multi-LLM Agent
CoRR(2024)
摘要
Large Language Model (LLM) agents significantly extend the capabilities of
standalone LLMs, empowering them to interact with external tools (e.g., APIs,
functions) and complete complex tasks in a self-directed fashion. The challenge
of tool use demands that LLMs not only understand user queries and generate
answers but also excel in task planning, memory management, tool invocation,
and result summarization. While traditional approaches focus on training a
single LLM with all these capabilities, performance limitations become
apparent, particularly with smaller models. Moreover, the entire LLM may
require retraining when tools are updated. To overcome these challenges, we
propose a novel strategy that decomposes the aforementioned capabilities into a
planner, caller, and summarizer. Each component is implemented by a single LLM
that focuses on a specific capability and collaborates with other components to
accomplish the task. This modular framework facilitates individual updates and
the potential use of smaller LLMs for building each capability. To effectively
train this framework, we introduce a two-stage training paradigm. First, we
fine-tune a backbone LLM on the entire dataset without discriminating
sub-tasks, providing the model with a comprehensive understanding of the task.
Second, the fine-tuned LLM is used to instantiate the planner, caller, and
summarizer respectively, which are continually fine-tuned on respective
sub-tasks. Evaluation across various tool-use benchmarks illustrates that our
proposed multi-LLM framework surpasses the traditional single-LLM approach,
highlighting its efficacy and advantages in tool learning.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要