GenCodeSearchNet: A Benchmark Test Suite for Evaluating Generalization in Programming Language Understanding.
CoRR(2023)
摘要
Language models can serve as a valuable tool for software developers to
increase productivity. Large generative models can be used for code generation
and code completion, while smaller encoder-only models are capable of
performing code search tasks using natural language queries.These capabilities
are heavily influenced by the quality and diversity of the available training
data. Source code datasets used for training usually focus on the most popular
languages and testing is mostly conducted on the same distributions, often
overlooking low-resource programming languages. Motivated by the NLP
generalization taxonomy proposed by Hupkes et.\,al., we propose a new benchmark
dataset called GenCodeSearchNet (GeCS) which builds upon existing natural
language code search datasets to systemically evaluate the programming language
understanding generalization capabilities of language models. As part of the
full dataset, we introduce a new, manually curated subset StatCodeSearch that
focuses on R, a popular but so far underrepresented programming language that
is often used by researchers outside the field of computer science. For
evaluation and comparison, we collect several baseline results using fine-tuned
BERT-style models and GPT-style large language models in a zero-shot setting.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要