ml-Codesmell: A code smell prediction dataset for machine learning approaches.

SoICT(2022)

引用 0|浏览8
暂无评分
摘要
In recent years, many studies on detecting code smells in source code have published datasets with limited characteristics, such as the ambiguity of code smell definitions leads to different interpretations for each code smell, the number of samples of the datasets is small, and the features of the datasets are heterogeneous. Therefore, comparing performance between detecting code smell models is challenging, and the datasets are often not reusable in other code smell detection studies. In this work, we propose the ml-Codesmell dataset created by analyzing source code and extracting massive source code metrics with many labelled code smells. The proposed dataset has been used to train and predict code smell using machine learning algorithms. Based on the high confidential F1-score in evaluation, the ml-Codesmell dataset demonstrates a strong correlation between features and labels. Regarding these advantages, the ml-Codesmell dataset is expected to be helpful for studies on detecting code smell using machine learning approaches in software development.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要