Semantic feature learning for software defect prediction from source code and external knowledge

Journal of Systems and Software(2023)

引用 1|浏览36
暂无评分
摘要
Software defects not only reduce operational reliability but also significantly increase overall maintenance costs. Consequently, it is necessary to predict software defects at an early stage. Existing software defect prediction studies work with artificially designed metrics or features extracted from source code by machine learning-based approaches to perform classification. However, these methods fail to make full use of the defect-related information other than code, such as comments in codes and commit messages. Therefore, in this paper, additional information extracted from natural language text is combined with the programming language codes to enrich the semantic features. A novel model based on Transformer architecture and multi-channel CNN, PM2-CNN, is proposed for software defect prediction. Pretrained language model and CNN-based classifier are utilized in the model to obtain context-sensitive representations and capture the local correlation of sequences. A large and widely used dataset is utilized to verify the effectiveness of the proposed method. The results show that the proposed method has improvements in generic evaluation metrics compared with the optimal baseline method. Accordingly, external information can have a positive impact on software defect prediction, and our model effectively incorporates such information to improve detection performance.
更多
查看译文
关键词
software defect prediction,semantic feature,source code,external knowledge
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要