SecretHunter: A Large-scale Secret Scanner for Public Git Repositories.

TrustCom(2022)

引用 2|浏览9
暂无评分
摘要
Collaborative software development platforms like GitHub have gained tremendous popularity. Unfortunately, many users have reportedly leaked authentication secrets (e.g., textual passwords and API keys) in public Git repositories and caused security incidents and finical loss. Recently, several tools were built to investigate the secret leakage in GitHub. However, these tools could only discover and scan a limited portion of files in GitHub due to platform API restrictions and band-width limitations. In this paper, we present SecretHunter, a real-time large-scale comprehensive secret scanner for GitHub. SecretHunter resolves the file discovery and retrieval difficulty via two major improvements to the Git cloning process. Firstly, our system will retrieve file metadata from repositories before cloning file contents. The early metadata access can help identify newly committed files and enable many bandwidth optimizations such as filename filtering and object deduplication. Secondly, SecretHunter adopts a reinforcement learning model to analyze file contents being downloaded and infer whether the file is sensitive. If not, the download process can be aborted to conserve bandwidth. We conduct a one-month empirical study to evaluate SecretHunter. Our results show that SecretHunter discovers 57% more leaked secrets than state-of-the-art tools. SecretHunter also reduces 85% bandwidth consumption in the object retrieval process and can be used in low-bandwidth settings (e.g., 4G connections).
更多
查看译文
关键词
n/a
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要