Inside Commits - An Empirical Study on Commits in Open-Source Software.

SBES(2021)

引用 1|浏览5
暂无评分
摘要
GitHub is currently the most popular open-source software hosting platform, containing about 20 million public repositories. Many studies have relied on data mined from GitHub repositories, especially commits. However, not knowing the characteristics of commits may introduce biases and threats in those studies. This work presents an empirical study to characterize commits in terms of three aspects: categories of activities performed in the commits; co-occurrences of activities in commits; and size of commits by category. We analyzed 1M commits from the 24 most popular and most active Java-based projects hosted in GitHub. The main findings of this work show that: reengineering is the most frequent activity; 30% of commits involve more than one type of activity; the most common co-occurrence of activities in commits is reengineering with forwarding and corrective reengineering, however in a low rate, only 8%. The results of this study should be considered by empirical works to avoid threats and biases when considering commits’ data.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要