A Multidimensional Analysis of Bug Density in SAP HANA

Julian Reck,Thomas Bach, Jan Stoess

PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023(2023)

引用 0|浏览0
暂无评分
摘要
Researchers and practitioners have been studying correlations between software metrics and defects for decades. The typical approach is to postulate a hypothesis that a certain metric correlates with the number of defects. A statistical test then utilizes historical data to accept or reject the hypothesis. Although this methodology has been widely adopted, our own experience is that such correlations are often limited in their practical relevance, particularly for large industrial projects: Interpreting and arguing about them is challenging and cumbersome; the difference between correlation and causation might not be clear; and the practical impact of a correlation is often questioned due to misconceptions between a statistical conclusion and the impact on singular events. Instead of discussing correlations, we found that the analysis for binary testedness results in more fruitful discussions. Binary testedness, as proposed by prior work, utilizes a metric to divide the source code into two parts and verifies whether more (or less) defects appear in each part than expected. In our work, we leverage the binary testedness approach and analyze several software metrics for a large industrial project to illustrate the concept. We furthermore introduce dynamic thresholds as a novel and more practical approach for source code classification compared to the static binary classification of previous works. Our results show that some studied metrics have a significant correlation with bug distribution, but effect sizes differ by several magnitudes across metrics. Overall, our approach moves away from "metric X correlates with defects" to a more fruitful "source code with attribute X has more (or less) bugs than expected", reframing the discussion from questioning statistics and methods towards an evidence-based root cause analysis.
更多
查看译文
关键词
empirical study,bug density,software quality,database
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要