DitDetector: Bimodal Learning based on Deceptive Image and Text for Macro Malware Detection.

Jia Yan, Ming Wan,Xiangkun Jia,Lingyun Ying,Purui Su, Zhanyi Wang

ACSAC(2022)

引用 1|浏览40
暂无评分
摘要
Macro malware has always been a severe threat to cyber security although the Microsoft Office suite applies the default macro-disabling policy. Among the defense solutions at different stages of the attack chain, document analysis is more targeted through detecting malicious documents with macro malware. It is effective, especially with machine learning methods, but still faces problems handling malware variants, supporting file formats, and attack countermeasures with advanced attack techniques (e.g., Excel 4.0 macro and remote template injection). In this paper, we find it promising to detect deceptive information embedded in documents which tricks users into enabling macros instead of detecting file metadata or extracted macro codes. Thus, we propose a novel solution for macro malware detection named DitDetector, which leverages bimodal learning based on deceptive images and text. Specifically, we extract preview images of documents based on an image export SDK of Oracle and extract textual information from preview images based on an open-source OCR engine. The bimodal model of DitDetector contains a visual encoder, a textual encoder, and a forward neural network, which learns based on the joint representation of the two encoders' outputs. We evaluate DitDetector on three datasets, including an open-source malicious document dataset (i.e., MalDoc) and two collected realworld adversary datasets (i.e., a database of Excel macros and a database of remote template injection samples). Our experiments show that DitDetector outperforms four existing macro code-based machine learning methods and five reputable Anti-Virus engines. Especially in the real-world test of advanced macro malware, DitDetector gets the F1-score of 99.93% which is at least 3.16% higher than compared solutions.
更多
查看译文
关键词
malware detection, macro malware, bimodal learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要