Executable Code Actions Elicit Better LLM Agents
CoRR(2024)
摘要
Large Language Model (LLM) agents, capable of performing a broad range of
actions, such as invoking tools and controlling robots, show great potential in
tackling real-world challenges. LLM agents are typically prompted to produce
actions by generating JSON or text in a pre-defined format, which is usually
limited by constrained action space (e.g., the scope of pre-defined tools) and
restricted flexibility (e.g., inability to compose multiple tools). This work
proposes to use executable Python code to consolidate LLM agents' actions into
a unified action space (CodeAct). Integrated with a Python interpreter, CodeAct
can execute code actions and dynamically revise prior actions or emit new
actions upon new observations through multi-turn interactions. Our extensive
analysis of 17 LLMs on API-Bank and a newly curated benchmark shows that
CodeAct outperforms widely used alternatives (up to 20
The encouraging performance of CodeAct motivates us to build an open-source LLM
agent that interacts with environments by executing interpretable code and
collaborates with users using natural language. To this end, we collect an
instruction-tuning dataset CodeActInstruct that consists of 7k multi-turn
interactions using CodeAct. We show that it can be used with existing data to
improve models in agent-oriented tasks without compromising their general
capability. CodeActAgent, finetuned from Llama2 and Mistral, is integrated with
Python interpreter and uniquely tailored to perform sophisticated tasks (e.g.,
model training) using existing libraries and autonomously self-debug.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要