Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?
CoRR(2024)
摘要
Large language models (LLMs) are typically prompted to follow a single
instruction per inference call. In this work, we analyze whether LLMs also hold
the capability to handle multiple instructions simultaneously, denoted as
Multi-Task Inference. For this purpose, we introduce the MTI Bench(Multi-Task
Inference Benchmark), a comprehensive evaluation benchmark encompassing 5,000
instances across 25 tasks. Each task in the MTI Bench involves 2 to 3
sub-tasks. As expected, we first demonstrate that Multi-Task Inference reduces
the total inference time by 1.46 times in average since it does not require
multiple inference calls. Interestingly, contrary to the expectation that LLMs
would perform better when tasks are divided, we find that state-of-the-art
LLMs, such as Llama-2-Chat-70B and GPT-4, show up to 7.3
performance with Multi-Task Inference compared to Single-Task Inference on the
MTI Bench. We release the MTI Bench dataset and our code at this link
https://github.com/guijinSON/MTI-Bench.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要