Large-Scale Empirical Studies on Effort-Aware Security Vulnerability Prediction Methods

Xiang Chen,Yingquan Zhao,Zhanqi Cui,Guozhu Meng,Yang Liu,Zan Wang

IEEE Transactions on Reliability（2020）

引用 30|浏览36

暂无评分

摘要

Security vulnerability prediction (SVP) can identify potential vulnerable modules in advance and then help developers to allocate most of the test resources to these modules. To evaluate the performance of different SVP methods, we should take the security audit and code inspection into account and then consider effort-aware performance measures (such as

$ACC$

and

$P_{\rm opt}$

). However, to the best of our knowledge, the effectiveness of different SVP methods has not been thoroughly investigated in terms of effort-aware performance measures. In this article, we consider 48 different SVP methods, of which 36 are supervised methods and 12 are unsupervised methods. For the supervised methods, we consider 34 software-metric-based methods and two text-mining-based methods. For the software-metric-based methods, in addition to a large number of classification methods, we also consider four state-of-the-art methods (i.e., EALR, OneWay, CBS, and MULTI) proposed in recent effort-aware just-in-time defect prediction studies. For text-mining-based methods, we consider the Bag-of-Word model and the term-frequency-inverse-document-frequency model. For the unsupervised methods, all the modules are ranked in the ascendent order based on a specific metric. Since 12 software metrics are considered when measuring extracted modules, there are 12 different unsupervised methods. To the best of our knowledge, over 40 SVP methods have not been considered in previous SVP studies. In our large-scale empirical studies, we use three real open-source web applications written in PHP as benchmark. These three web applications include 3466 modules and 223 vulnerabilities in total. We evaluate these SVP methods both in the within-project SVP scenario and the cross-project SVP scenario. Empirical results show that two unsupervised methods [i.e., lines of code (LOC) and Halstead's volume (HV)] and four recently proposed state-of-the-art supervised methods (i.e., MULTI, OneWay, CBS, and EALR) can achieve better performance than the other methods in terms of effort-aware performance measures. Then, we analyze the reasons why these six methods can achieve better performance. For example, when using 20% of the entire efforts, we find that these six methods always require more modules to be inspected, especially for unsupervised methods LOC and HV. Finally, from the view of practical vulnerability localization, we find that all the unsupervised methods and the OneWay method have high false alarms before finding the first vulnerable module. This may have an impact on developers’ confidence and tolerance, and supervised methods (especially MULTI and text-mining-based methods) are preferred.

查看译文

关键词

Security,Software metrics,Correlation,Open source software

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要