SynCode: LLM Generation with Grammar Augmentation
arxiv(2024)
摘要
LLMs are widely used in complex AI applications. These applications
underscore the need for LLM outputs to adhere to a specific format, for their
integration with other components in the systems. Typically the format rules
e.g., for data serialization formats such as JSON, YAML, or Code in Programming
Language are expressed as context-free grammar (CFG). Due to the hallucinations
and unreliability of LLMs, instructing LLMs to adhere to specified syntax
becomes an increasingly important challenge.
We present SynCode, a novel framework for efficient and general syntactical
decoding with LLMs, to address this challenge. SynCode leverages the CFG of a
formal language, utilizing an offline-constructed efficient lookup table called
DFA mask store based on the discrete finite automaton (DFA) of the language
grammar terminals. We demonstrate SynCode's soundness and completeness given
the CFG of the formal language, presenting its ability to retain syntactically
valid tokens while rejecting invalid ones. SynCode seamlessly integrates with
any language defined by CFG, as evidenced by experiments focusing on generating
JSON, Python, and Go outputs. Our experiments evaluating the effectiveness of
SynCode for JSON generation demonstrate that SynCode eliminates all syntax
errors and significantly outperforms state-of-the-art baselines. Furthermore,
our results underscore how SynCode significantly reduces 96.07
errors in generated Python and Go code, showcasing its substantial impact on
enhancing syntactical precision in LLM generation. Our code is available at
https://github.com/uiuc-focal-lab/syncode
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要