NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free AttentionTianyi Zhang, Jonah Wonkyu Yi, Bowen Yao,Zhaozhuo Xu,Anshumali ShrivastavaNeurIPS 2024(2024)引用 6|浏览29关键词large language model,efficiency,CPU inference,attentionAI 理解论文溯源树样例生成溯源树,研究论文发展脉络Chat Paper正在生成论文摘要