WeChat Mini Program
Old Version Features

Universal Model Routing for Efficient LLM Inference.

Wittawat Jitkrittum,Harikrishna Narasimhan,Ankit Singh Rawat, Jeevesh Juneja, Congchao Wang,Zifeng Wang, Alec Go,Chen-Yu Lee, Pradeep Shenoy, Rina Panigrahy,Aditya Krishna Menon,Sanjiv Kumar

CoRR(2025)

Cited 0|Views8
Abstract
Model routing is a simple technique for reducing the inference cost of large language models (LLMs), wherein one maintains a pool of candidate LLMs, and learns to route each prompt to the smallest feasible LLM. Existing works focus on learning a router for a fixed pool of LLMs. In this paper, we consider the problem of dynamic routing, where new, previously unobserved LLMs are available at test time. We propose UniRoute, a new approach to this problem that relies on representing each LLM as a feature vector, derived based on predictions on a set of representative prompts. Based on this, we detail two effective instantiations of UniRoute, relying on cluster-based routing and a learned cluster map respectively. We show that these are estimates of a theoretically optimal routing rule, and quantify their errors via an excess risk bound. Experiments on a range of public benchmarks show the effectiveness of UniRoute in routing amongst more than 30 unseen LLMs.
More
Translated text
PDF
Bibtex
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper

要点】:本文提出了一种面向动态模型池的通用模型路由方法,有效降低大型语言模型推理成本,创新点在于处理测试时新增的、未观测过的大型语言模型。

方法】:通过将每个大型语言模型表示为一个特征向量,该特征向量基于一组代表性提示的预测结果生成,进而提出基于聚类路由和学习的聚类地图两种有效策略。

实验】:在多个公共基准测试上进行实验,使用未见过的30多个大型语言模型进行路由,证明了所提出策略的有效性。