Elisa: Eliciting Isa Of Raw Binaries For Fine-Grained Code And Data Separation

DETECTION OF INTRUSIONS AND MALWARE, AND VULNERABILITY ASSESSMENT, DIMVA 2018(2018)

引用 11|浏览107
暂无评分
摘要
Static binary analysis techniques are widely used to reconstruct the behavior and discover vulnerabilities in software when source code is not available. To avoid errors due to mis-interpreting data as machine instructions (or vice-versa), disassemblers and static analysis tools must precisely infer the boundaries between code and data. However, this information is often not readily available. Worse, compilers may embed small chunks of data inside the code section. Most state of the art approaches to separate code and data are rooted on recursive traversal disassembly, with severe limitations when dealing with indirect control instructions. We propose ELISA, a technique to separate code from data and ease the static analysis of executable files. ELISA leverages supervised sequential learning techniques to locate the code section(s) boundaries of header-less binary files, and to predict the instruction boundaries inside the identified code section. As a preliminary step, if the Instruction Set Architecture (ISA) of the binary is unknown, ELISA leverages a logistic regression model to identify the correct ISA from the file content. We provide a comprehensive evaluation on a dataset of executables compiled for different ISAs, and we show that our method is capable to identify code sections with a byte-level accuracy (F1 score) ranging from 98.13% to over 99.9% depending on the ISA. Fine-grained separation of code from embedded data on x86, x86-64 and ARM executables is accomplished with an accuracy of over 99.9%.
更多
查看译文
关键词
Code Section, Static Binary Analysis, Byte Frequency Distribution (BFD), Code Discovery, Symbolic Debugger
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要