A Case Study of Bias in Bug-Fix Datasets

Reverse Engineering(2010)

引用 68|浏览0
暂无评分
摘要
Software quality researchers build software quality models by recovering traceability links between bug reports in issue tracking repositories and source code files. However, all too often the data stored in issue tracking repositories is not explicitly tagged or linked to source code. Researchers have to resort to heuristics to tag the data (e.g., to determine if an issue is a bug report or a work item), or to link a piece of code to a particular issue or bug. Recent studies by Bird et al. and by Antoniol et al. suggest that software models based on imperfect datasets with missing links to the code and incorrect tagging of issues, exhibit biases that compromise the validity and generality of the quality models built on top of the datasets. In this study, we verify the effects of such biases for a commercial project that enforces strict development guidelines and rules on the quality of the data in its issue tracking repository. Our results show that even in such a perfect setting, with a near-ideal dataset, biases do exist – leading us to conjecture that biases are more likely a symptom of the underlying software development process instead of being due to the used heuristics.
更多
查看译文
关键词
bug-fix datasets,particular issue,software quality researcher,issue tracking repository,source code,quality model,bug report,case study,source code file,software quality model,software model,underlying software development process,software engineering,prediction,bias,software quality,source coding,computer bugs,couplings,software development process,sample,data quality
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要