Are Latent Vulnerabilities Hidden Gems for Software Vulnerability Prediction? An Empirical Study
CoRR(2024)
摘要
Collecting relevant and high-quality data is integral to the development of
effective Software Vulnerability (SV) prediction models. Most of the current SV
datasets rely on SV-fixing commits to extract vulnerable functions and lines.
However, none of these datasets have considered latent SVs existing between the
introduction and fix of the collected SVs. There is also little known about the
usefulness of these latent SVs for SV prediction. To bridge these gaps, we
conduct a large-scale study on the latent vulnerable functions in two commonly
used SV datasets and their utilization for function-level and line-level SV
predictions. Leveraging the state-of-the-art SZZ algorithm, we identify more
than 100k latent vulnerable functions in the studied datasets. We find that
these latent functions can increase the number of SVs by 4x on average and
correct up to 5k mislabeled functions, yet they have a noise level of around
6
can significantly benefit from such latent SVs. The improvements are up to
24.5
67
presents the first promising step toward the use of latent SVs to improve the
quality of SV datasets and enhance the performance of SV prediction tasks.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要