PATVD:Vulnerability Detection Based on Pre-training Techniques and Adversarial Training

SmartWorld/UIC/ScalCom/DigitalTwin/PriComp/Meta(2022)

引用 0|浏览3
暂无评分
摘要
Software vulnerability detection has attracted more and more attention. Traditional vulnerability detection methods require a lot of expertise to define the vulnerability features, whereas the deep learning based methods can perform the feature extraction automatically. These deep learning based techniques handle the problem directly by regarding the code fragments as a series of tokens or a single graph structure, which makes them perform well on synthetic or semi-synthetic datasets. However, for real-world scenarios, the existing deep learning based techniques still suffer from some limitations. In this work, we propose PATVD, a vulnerability detection method combining pre-training techniques and adversarial training. We first use the pre-training model to combine the source code fragments and the corresponding data flow graph, such that both the characteristics of code structures and the token sequence can be integrated to learn the more complex vulnerability information contained in the real-world datasets. Then, to address the imbalance in the distribution of real-world datasets, we incorporate adversarial training on the code embedding space as a way of regularization to improve model generalization. Finally, the code representation added to the adversarial training is given to the classification model. To verify the validity of our method, we perform a set of comparative experimental evaluations on both artificially synthesized datasets and the latest published real-world datasets. The experimental results show that PATVD has achieved better results than the state-of-the-art methods in both synthetic datasets and real-world datasets. The adversarial training module can also make the model attain additional gains on real-world datasets.
更多
查看译文
关键词
vulnerability detection,pre-training techniques,adversarial training
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要