BenchCLAMP: A Benchmark for Evaluating Language Models on Syntactic and Semantic Parsing
NeurIPS(2022)
摘要
Recent work has shown that generation from a prompted or fine-tuned language
model can perform well at semantic parsing when the output is constrained to be
a valid semantic representation. We introduce BenchCLAMP, a Benchmark to
evaluate Constrained LAnguage Model Parsing, that includes context-free
grammars for seven semantic parsing datasets and two syntactic parsing datasets
with varied output representations, as well as a constrained decoding interface
to generate only valid outputs covered by these grammars. We provide low,
medium, and high resource splits for each dataset, allowing accurate comparison
of various language models under different data regimes. Our benchmark supports
evaluation of language models using prompt-based learning as well as
fine-tuning. We benchmark eight language models, including two GPT-3 variants
available only through an API. Our experiments show that encoder-decoder
pretrained language models can achieve similar performance or surpass
state-of-the-art methods for syntactic and semantic parsing when the model
output is constrained to be valid.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要