Beyond Precision and Recall: Understanding Uses (and Misuses) of Similarity Hashes in Binary Analysis.

CODASPY(2018)

引用 22|浏览62
暂无评分
摘要
Fuzzy hashing algorithms provide a convenient way of summarizing in a compact form the content of files, and of looking for similarities between them. Because of this, they are widely used in the security and forensics communities to look for similarities between binary program files; one version of them, ssdeep, is the de facto standard to share information about known malware. Fuzzy hashes are quite pervasive, but no study so far answers conclusively the question of which (if any) fuzzy hashing algorithms are suited to detect similarities between programs, where we consider as similar those programs that have code or libraries in common. We measure how four popular algorithms perform in different scenarios: when they are used to correlate statically-compiled files with the libraries they use, when compiled with different flags or different compilers, and when applied to programs that share a large part of their source code. Perhaps more importantly, we provide interpretations that explain the reasons why results vary, sometimes widely, among apparently very similar use cases. We find that the low-level details of the compilation process, together with the technicalities of the hashing algorithms, can explain surprising results such as similarities dropping to zero with the change of a single assembly instruction. More in general, we see that ssdeep, the de facto standard for this type of analysis, performs definitely worse than alternative algorithms; we also find that the best choice of algorithm to use varies depending on the particularities of the use case scenario.
更多
查看译文
关键词
binary analysis, fuzzy hash, malware, approximate matching
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要