An LLM Compiler for Parallel Function Calling
CoRR(2023)
摘要
Large Language Models (LLMs) have shown remarkable results on various complex
reasoning benchmarks. The reasoning capabilities of LLMs enable them to execute
function calls, using user-provided functions to overcome their inherent
limitations, such as knowledge cutoffs, poor arithmetic skills, or lack of
access to private data. This development has expanded LLMs' scope to include
multi-function calling, where LLMs are equipped with a variety of functions and
select the proper functions based on the context. Multi-function calling
abilities of LLMs have catalyzed LLM-based software development, allowing them
to tackle more complex problems. However, current methods for multi-function
calling often require sequential reasoning and acting for each function which
can result in high latency, cost, and sometimes inaccurate behavior. To address
this, we introduce LLMCompiler, which executes functions in parallel to
efficiently orchestrate multi-function calling. Drawing from the principles of
classical compilers, LLMCompiler streamlines parallel function calling with
three components: (i) an LLM Planner, formulating execution strategies and
dependencies; (ii) a Task Fetching Unit, dispatching function calling tasks;
and (iii) an Executor, executing these tasks in parallel. LLMCompiler
automatically computes an optimized orchestration for the function calls and
can be used with open-source models such as LLaMA-2. We have benchmarked
LLMCompiler on a range of tasks including cases with non-trivial
inter-dependency between function calls, as well as cases that require dynamic
replanning based on intermediate results. We observe consistent latency speedup
of up to 3.7x, cost savings of up to 6.7x, and accuracy improvement of up to
~9% as compared to ReAct. Additionally, LLMCompiler achieves up to 1.35x
latency gain over OpenAI's recent parallel function calling, while achieving
similar accuracy.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要