FlowEmbed: Binary function embedding model based on relational control flow graph and byte sequence.

Yongpan Wang, Chaopeng Dong,Siyuan Li, Fucai Luo, Renjie Su,Zhanwei Song,Hong Li

International Conference on Parallel and Distributed Systems(2023)

引用 0|浏览0
暂无评分
摘要
Binary function embedding models are applicable to various downstream tasks within IoT device software systems and have demonstrated advantages in numerous binary analysis tasks, such as vulnerability (homologous) function search and compilation optimization option identification. However, current binary function embedding methods either learn embedding based on code sequence, which lack the program semantics of functions (e.g., control flow, etc.) or based on program structure graphs, which omit global sequential information. As a result, these methods fall short in enabling models to learn the complete semantic of function. In this paper, we introduce FlowEmbed, a novel approach that synergistically integrates control flow and global semantic learning to facilitate exhaustive code comprehension. Initially, FlowEmbed harnesses a distinct relational control flow graph combined with the power of BERT and RGCN models to aptly capture the nuances of control flow semantics. Moreover, by deploying the DPCNN model on a byte sequence constructed from function machine code, FlowEmbed adeptly discerns the inherent global sequential semantics of binary functions. Through rigorous evaluations spanning three IoT-related tasks, FlowEmbed’s efficacy becomes evident, showcasing notable improvements: a 20.6% improvement in compilation optimization option identification, a 1.8% improvement in binary function similarity analysis, and an 11.9% improvement in homologous function search. Collectively, these results underscore FlowEmbed’s superior capability, positioning it as a invaluable asset in a binary analysis application.
更多
查看译文
关键词
deep learning,binary function embedding,static analysis,binary code search,binary code similarity detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要