Detecting Vulnerabilities in Source Code Using Machine Learning

PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON APPLIED CYBER SECURITY (ACS) 2021(2022)

引用 1|浏览2
暂无评分
摘要
In recent years, software vulnerabilities have been the source of countless cyber attacks. Despite the existence of various methods for detecting software vulnerabilities like static analysis tools and dynamic analysis tools, the number of vulnerabilities discovered each year continues to climb. Over the last decade, machine learning models have made a significant progress. As machine learning methods, do not need human experts to define features and can learn vulnerability patterns automatically as it captures patterns that human may not understand. The goal of this paper is to create a machine learning model that could discover vulnerabilities in the function scope (i.e. method or procedure in any programming). To accomplish this aim we propose a novel feature extraction technique based on clustering the vocabulary of the function text using Kmeans. Typically, the vulnerability classification problem is an imbalanced one, as most functions are naturally not vulnerable. We use a dataset and compare the effect of four different class imbalance handling techniques (Class weight, Random Undersampling, Random Oversampling and Synthetic Minority Oversampling Technique (SMOTE)). Results show that using the Class Weight modification technique is the best in both metrics we use (Recall and F1 score) (76% and 17%). The results show that our method has very comparable results relative to other methods in addition to faster training time due to using shallow machine learning model.
更多
查看译文
关键词
vulnerabilities,source code,machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要