CloCom: Mining existing source code for automatic comment generation

Software Analysis, Evolution and Reengineering(2015)

引用 145|浏览71
暂无评分
摘要
Code comments are an integral part of software development. They improve program comprehension and software maintainability. The lack of code comments is a common problem in the software industry. Therefore, it is beneficial to generate code comments automatically. In this paper, we propose a general approach to generate code comments automatically by analyzing existing software repositories. We apply code clone detection techniques to discover similar code segments and use the comments from some code segments to describe the other similar code segments. We leverage natural language processing techniques to select relevant comment sentences. In our evaluation, we analyze 42 million lines of code from 1,005 open source projects from GitHub, and use them to generate 359 code comments for 21 Java projects. We manually evaluate the generated code comments and find that only 23.7% of the generated code comments are good. We report to the developers the good code comments, whose code segments do not have an existing code comment. Amongst the reported code comments, seven have been confirmed by the developers as good and committable to the software repository while the rest await for developers' confirmation. Although our approach can generate good and committable comments, we still have to improve the yield and accuracy of the proposed approach before it can be used in practice with full automation.
更多
查看译文
关键词
java,data mining,natural language processing,public domain software,software maintenance,source code (software),clocom,github,java projects,automatic comment generation,code clone detection techniques,code comments,existing source code mining,natural language processing techniques,open source projects,program comprehension,relevant comment sentences,similar code segments,software development,software industry,software maintainability,software repositories,software repository,comment generation,documentation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要