SpecInfer: Accelerating Large Language Model Serving with Tree-based Speculative Inference and Verification.

Xupeng Miao, Gabriele Oliaro,Zhihao Zhang, Xinhao Cheng, Zeyu Wang,Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, Xiaoxiang Shi, Chunan Shi, Zhuoming Chen, Daiyaan Arfeen, Reyna Abhyankar,Zhihao Jia

International Conference on Architectural Support for Programming Languages and Operating Systems(2024)

引用 0|浏览1
暂无评分
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要