Semi-supervised detection of Long Method and God Class code smells

Ilija Brdar, Jelena Vlajkov,Jelena Slivka,Katarina-Glorija Grujić,Aleksandar Kovačević

2022 IEEE 20th Jubilee International Symposium on Intelligent Systems and Informatics (SISY)（2022）

引用 0|浏览22

暂无评分

摘要

Code smells are poorly designed parts of code whose removal is essential for sustainable software development. However, recognizing code smells in practice is challenging. Machine Learning (ML)-based code smell detectors could solve this problem. Current ML-based code smell detection approaches are based on supervised learning (SL) that requires a large and diverse dataset for training. Unfortunately, the existing code smell datasets are small, which hinders the performance of the trained SL models. This paper aims to improve the performance of ML-based code smell detectors by employing semi-supervised learning (SSL). SSL models are trained by combining a manually labeled code smell dataset with unlabeled code snippets collected from open-source repositories. Two major SSL techniques are employed: self-training and co-training. Experiments were performed for two code smell types: God Class and Long Method. SSL classifiers significantly outperformed SL classifiers for God Class detection (by 6% F-measure). For Long Method detection, SSL classifiers slightly outperformed SL classifiers (by 1%F-measure). This paper is the first to consider applying SSL for code smell detection. SSL models outperforming SL models in all experiments suggest that SSL holds the great potential to improve current code smell detectors, which is essential for their adoption in practice.

查看译文

关键词

code smells,machine learning,semi-supervised learning,code metrics,code embeddings

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要