A Theory of Equivalence-Preserving Program Embeddings

ICLR 2023(2023)

引用 0|浏览65
暂无评分
摘要
Program embeddings are used to solve tasks such as \textit{code clone detection} and \textit{semantic labeling}. Solutions to these \textit{semantic tasks} should be invariant to semantics-preserving program transformations. When a program embedding function satisfies this invariance, we call it an \textit{equivalence-preserving program embedding function}. We say a programming language can be \textit{tractably embedded} when we can construct an equivalence-preserving program embedding function that executes in polynomial time in program/input length and produces program embeddings that are proportional to the input length. Determining whether a programming language can be tractably embedded is the \textit{equivalence-preserving program embedding problem}. We formalize this problem and theoretically characterize when programming languages can be tractably embedded. To validate our theoretical results, we use the BERT-Tiny model to learn an equivalence-preserving program embedding function for a programming language that can be tractably embedded and show the model fails to construct an equivalence-preserving program embedding function for a similar language that is intractable to embed.
更多
查看译文
关键词
Programming Languages,Program Embeddings,Code,Big Code
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要