Can a Deep Learning Model for One Architecture Be Used for Others? Retargeted-Architecture Binary Code Analysis

Junzhe Wang, Matthew Sharp,Chuxiong Wu,Qiang Zeng,Lannan Luo

PROCEEDINGS OF THE 32ND USENIX SECURITY SYMPOSIUM(2023)

引用 0|浏览8
暂无评分
摘要
NLP-inspired deep learning for binary code analysis demonstrates notable performance. Considering the diverse Instruction Set Architectures (ISAs) on the market, it is important to be able to analyze code of various ISAs. However, training a deep learning model usually requires a large amount of data, which poses a challenge for certain ISAs such as PowerPC that suffer from the "data scarcity" issue. For instance, acquiring a large dataset of PowerPC malware proves to be challenging. Moreover, given a binary analysis task and multiple ISAs, it takes much time and effort (e.g., for data collection, labeling and cleaning, and parameter tuning) to train one model per ISA. We propose a new direction, retargeted-architecture binary code analysis, to handle the data scarcity issue and alleviate the per-ISA effort. Our idea is to transfer knowledge from one ISA to others-that is, a model, trained with rich data and much time and effort for one ISA, can perform prediction for others without any modification. We showcase the idea through two important tasks: malware detection and function similarity detection. An extensive evaluation involving four ISAs (x86, ARM, MIPS, and PowerPC) demonstrates the effectiveness of the approach and the high performance is interpreted.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要