An Empirical Validation Of Learning Schemes Using An Automated Genetic Defect Prediction Framework

Lecture Notes in Computer Science(2016)

引用 3|浏览5
暂无评分
摘要
Today, it is common for software projects to collect measurement data through development processes. With these data, defect prediction software can try to estimate the defect proneness of a software module, with the objective of assisting and guiding software practitioners. With timely and accurate defect predictions, practitioners can focus their limited testing resources on higher risk areas. This paper reports a benchmarking study that uses a genetic algorithm that automatically generates and compares different learning schemes (preprocessing + attribute selection + learning algorithms). Performance of the software development defect prediction models (using AUC, Area Under the Curve) was validated using NASA-MDP and PROMISE data sets. Twelve data sets from NASA-MDP (8) and PROMISE (4) projects were analyzed running a M x N-fold cross-validation. We used a genetic algorithm to select the components of the learning schemes automatically, and to evaluate and report those with the best performance. In all, 864 learning schemes were studied. The most common learning schemes were: data preprocessors: Log and CoxBox + attribute selectors: Backward Elimination, BestFirst and LinearForwardSelection + learning algorithms: NaiveBayes, NaiveBayesSimple, SimpleLogistic, MultilayerPerceptron, Logistic, LogitBoost, BayesNet, and OneR. The genetic algorithm reported steady performance and runtime among data sets, according to statistical analysis.
更多
查看译文
关键词
Software quality,Fault prediction models,Genetic algorithms,Learning schemes,Learning algorithms,Machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要