Beyond Lexical Consistency: Preserving Semantic Consistency for Program Translation*

23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, ICDM 2023(2023)

引用 0|浏览3
暂无评分
摘要
Program translation aims to convert the input programs from one programming language to another. Automatic program translation is a prized target of software engineering research, which leverages the reusability of projects and improves the efficiency of development. Recently, thanks to the rapid development of deep learning model architectures and the availability of large-scale parallel corpus of programs, the performance of program translation has been greatly improved. However, the existing program translation models are still far from satisfactory, in terms of the quality of translated programs. In this paper, we argue that a major limitation of the current approaches is the lack of consideration of semantic consistency. Beyond lexical consistency, semantic consistency is also critical for the task. To make the program translation model more semantically aware, we propose a general framework named Preserving Semantic Consistency for Program Translation (PSCPT), which considers semantic consistency with regularization in the training objective of program translation and can be easily applied to all encoder-decoder methods with various neural networks (e.g., LSTM, Transformer) as the backbone. We conduct extensive experiments in 7 general programming languages. Experimental results show that with CodeBERT as the backbone, our approach outperforms not only the state-of-the-art open-source models but also the commercial closed large language models (e.g., textdavinci-002, text-davinci-003) on the program translation task. Our replication package (including code, data, etc.) is publicly available at httpsa/githuh.com/duyali20011/I'SCl'T.
更多
查看译文
关键词
Program Translation,Semantic Consistency,Regularization,Large Language Model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要