HFCommunity: An extraction process and relational database to analyze Hugging Face Hub data

SCIENCE OF COMPUTER PROGRAMMING(2024)

引用 0|浏览19
暂无评分
摘要
Social coding platforms such as GITHUB or GITLAB have become the de facto standard for developing Open -Source Software (OSS) projects. With the emergence of Machine Learning (ML), platforms specifically designed for hosting and developing ML -based projects have appeared, being HUGGING FACE HUB (HFH) one of the most popular ones. HFH aims at sharing datasets, pre-trained ML models and the applications built with them. With over 400 K repositories, and growing fast, HFH is becoming a promising source of empirical data on all aspects of ML project development. However, apart from the API provided by the platform, there are no easy-to-use solutions to collect the data, nor prepackaged datasets to explore the different facets of HFH. We present HFCOMMUNITY, an extraction process for HFH data and a relational database to facilitate an empirical analysis on the growing number of ML projects.
更多
查看译文
关键词
Mining software repositories,Data analysis,Hugging Face
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要