HFCommunity: An extraction process and relational database to analyze Hugging Face Hub data
SCIENCE OF COMPUTER PROGRAMMING(2024)
摘要
Social coding platforms such as GITHUB or GITLAB have become the de facto standard for developing Open -Source Software (OSS) projects. With the emergence of Machine Learning (ML), platforms specifically designed for hosting and developing ML -based projects have appeared, being HUGGING FACE HUB (HFH) one of the most popular ones. HFH aims at sharing datasets, pre-trained ML models and the applications built with them. With over 400 K repositories, and growing fast, HFH is becoming a promising source of empirical data on all aspects of ML project development. However, apart from the API provided by the platform, there are no easy-to-use solutions to collect the data, nor prepackaged datasets to explore the different facets of HFH. We present HFCOMMUNITY, an extraction process for HFH data and a relational database to facilitate an empirical analysis on the growing number of ML projects.
更多查看译文
关键词
Mining software repositories,Data analysis,Hugging Face
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要