ROMEO: A binary vulnerability detection dataset for exploring Juliet through the lens of assembly language.

arxiv(2023)

引用 0|浏览2
暂无评分
摘要
Context: Automatic vulnerability detection on C/C++ source code has benefitted from the introduction of machine learning to the field, with many recent publications targeting this combination. In contrast, assembly language or machine code artifacts receive less attention, although there are compelling reasons to study them. They are more representative of what is executed, more easily incorporated in dynamic analysis, and in the case of closed-source code, there is no alternative.Objective: We evaluate the representative capability of assembly language compared to C/C++ source code for vulnerability detection. Furthermore, we investigate the role of call graph context in detecting function-spanning vulnerabilities. Finally, we verify whether compiling a benchmark dataset compromises an experiment's soundness by inadvertently leaking label information. Method: We propose ROMEO, a publicly available, reproducible and reusable binary vulnerability de-tection benchmark dataset derived from the synthetic Juliet test suite. Alongside, we introduce a simple text-based assembly language representation that includes context for function-spanning vulnerability de-tection and semantics to detect high-level vulnerabilities. It is constructed by disassembling the .text segment of the respective binaries.Results: We evaluate an x86 assembly language representation of the compiled dataset, combined with an off-the-shelf classifier. It compares favorably to state-of-the-art methods, including those operating on the full C/C++ code. Including context information using the call graph improves detection of function-spanning vulnerabilities. There is no label information leaked during the compilation process.Conclusion: Performing vulnerability detection on a compiled program instead of the source code is a worthwhile tradeoff. While certain information is lost, e.g., comments and certain identifiers, other valu-able information is gained, e.g., about compiler optimizations.(c) 2023 Elsevier Ltd. All rights reserved.
更多
查看译文
关键词
binary vulnerability detection,assembly language,juliet
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要