Learning Semantic Features For Software Defect Prediction By Code Comments Embedding

2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM)(2018)

引用 27|浏览20
暂无评分
摘要
Software Quality Assurance (SQA) is essential in software development and many defect prediction methods based on machine learning have been proposed to identify defective modules. However, most existing defect prediction models do not provide good defect prediction results, and the semantic features reflecting the detective patterns may not be well-captured via traditional feature extraction methods. More information such as code comments should be also be embedded to generate semantic features respecting the source code functionality. Therefore, how to embed code comments for defect prediction is a big challenge, and another problem is that many comments of source code are missing in real-world applications.In this paper, we propose a novel defect prediction model named CAP-CNN (Convolutional Neural Network for Comments Augmented Programs), which is a deep learning model that automatically embeds code comments in generating semantic features from the source code for software defect prediction. To overcome the missing comments problem, a novel training strategy is used in CAP-CNN that the network encodes and absorb comments information to generate semantic features automatically during training process, which does not need testing modules to contain comments. Experimental results on several widely-used software data sets indicate that the comment features are able to improve defect prediction performance.
更多
查看译文
关键词
data mining, software mining, software defect prediction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要