On the Vocabulary Agreement in Software Issue Descriptions

2016 IEEE International Conference on Software Maintenance and Evolution (ICSME)(2016)

引用 15|浏览46
暂无评分
摘要
Many software comprehension tasks depend on how stakeholders textually describe their problems. These textual descriptions are leveraged by Text Retrieval (TR)-based solutions to more than 20 software engineering tasks, such as duplicate issue detection. The common assumption of such methods is that text describing the same issue in multiple places will have a common vocabulary. This paper presents an empirical study aimed at verifying this assumption and discusses the impact of the common vocabulary on duplicate issue detection. The study investigated 13K+ pairs of duplicate bug reports and Stack Overflow (SO) questions. We found that on average, more than 12.2% of the duplicate pairs do not have common terms. The other duplicate issue descriptions share, on average, 30% of their vocabulary. The good news is that these duplicates have significantly more terms in common than the non-duplicates. We also found that the difference between the lexical agreement of duplicate and non-duplicate pairs is a good predictor for the performance of TR-based duplicate detection.
更多
查看译文
关键词
vocabulary agreement,software issue descriptions,software comprehension tasks,textual descriptions,text retrieval,TR-based solutions,software engineering tasks,duplicate issue detection,duplicate bug reports,stack overflow
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要