The Ups and Downs of Large Language Model Inference with Vocabulary Trimming by Language Heuristics
arxiv(2023)
摘要
Deploying large language models (LLMs) encounters challenges due to intensive
computational and memory requirements. Our research examines vocabulary
trimming (VT) inspired by restricting embedding entries to the language of
interest to bolster time and memory efficiency. While such modifications have
been proven effective in tasks like machine translation, tailoring them to LLMs
demands specific modifications given the diverse nature of LLM applications. We
apply two language heuristics to trim the full vocabulary - Unicode-based
script filtering and corpus-based selection - to different LLM families and
sizes. The methods are straightforward, interpretable, and easy to implement.
It is found that VT reduces the memory usage of small models by nearly 50
has an upper bound of 25
limitations of these methods in that they do not perform consistently well for
each language with diminishing returns in larger models.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要