Embedding vector generation based on function call graph for effective malware detection and classification

Neural Computing and Applications(2022)

引用 3|浏览0
暂无评分
摘要
The surge of malware poses a huge threat to cyberspace security. The existing malware analysis methods based on machine learning mainly rely on feature engineering. These methods need to extract many handcrafted features from the malware to improve accuracy, which increases the complexity of malware analysis. In order to solve this problem, this paper proposes GEMAL, a new malware analysis method based on function call graph (FCG) and graph embedding network. FCG contains the structure information of the binary file and has been used in various research of malware analysis. Inspired by natural language processing tasks, we treat instructions as words and functions as sentences, so that we can automatically extract semantic features using the natural language processing method. We use an attention mechanism based graph embedding network to combine structural features and semantic features to generate embedding vectors of malware for automatic and efficient malware analysis. We use two datasets to test the efficiency of GEMAL. One is a self-created dataset named WUFCG, which contains 70,188 real-world samples. The other one is the public dataset of the Microsoft Malware Classification Challenge, which contains 10,868 samples. Experimental results show that GEMAL can detect real-world malware with 99.16% accuracy and classify malware families with the best accuracy of 99.81%.
更多
查看译文
关键词
Malware detection,Malware classification,Function call graph,Graph embedding,Attention mechanism
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要