GitHub Label Embeddings

2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)(2020)

引用 3|浏览14
暂无评分
摘要
GitHub repository issues can be “tagged” with labels to provide better understanding, organization, classification and to make information retrieval easier for both users and project managers. GitHub provides nine default labels and allows users to create, edit, and delete labels to fit the project maintainers' management goals. Such labels can, for example, help users to find open source projects that are open for new collaborators since they are able to search for the default label good first issuein GitHub's search engine. However, such a mechanism would be more powerful if the platform knew semantically similar customized labels and also reaches projects with them. In this study, we investigate two NBNE-based approaches and another based on Word2Vec algorithm to represent labels as embeddings (i.e., as vectors on a multidimensional space), so that semantically similar labels get closer. As a result, we found that Word2Vec is better indicated for this task, although it actually deserves further investigation.
更多
查看译文
关键词
Software Engineering,Repository Mining,Embeddings,Representation Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要