Represent Code as Action Sequence for Predicting Next Method Call

Asia-Pacific Symposium on Internetware (Internetware)（2022）

引用 0|浏览18

暂无评分

摘要

As human beings take actions with a goal in mind, we could predict the following action of a person depending on his previous actions. Inspired by this, after collecting and analyzing more than 13,000 repositories with 441,290 Python source code files from the Internet, we find the actions expressed in code are in the developers' high-level programming language statements. Previous code comprehension and code completion research paid little attention to code editing contexts like code file names and repository names while representing code for machine learning models. After modeling code as action sequences and modeling method names, file names and repository names as code editing context, we use modern natural language processing techniques to utilize the huge open source resources from the Internet and train a code completion model which takes the action sequences in code as input to complete code for developers. In the evaluation part, the experiments we conduct show the GPT-2 model trained with our action sequence code representation achieves 81.92% top-5 accuracy for next method call token prediction, compared to 61.89% of same GPT-2 model trained with same dataset. As for the context of the code we propose, we find it important for machines to comprehend the code better. Given the pre-trained natural language model, the training time of our model for 1,000,000 lines code is less than 16.7 minutes. All the above contribute to code comprehension and enhance code completion via unlimited resources from the Internet.

查看译文

关键词

mining software repositories, software engineering with Big data, software engineering with AI

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要