NLabel: An Accurate Familial Clustering Framework for Large-Scale Weakly-Labeled Malware

Yannan Liu, Yabin Lai, Kaizhi Wei,Liang Gu,Zhengzheng Yan

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)(2020)

引用 0|浏览16
暂无评分
摘要
Automatic family labeling for malware is in demand, especially for today's malware scale. While business Anti-Virus engines provide an efficient family labeling method, the raw labels tend to be inconsistent. Prior works mitigate such inconsistency by detecting the aliases and majority voting to obtain the final family label. However, these methods solve the inconsistency in a coarse-grained and vulnerable manner, and the obtained family label is inaccurate sometimes. In this work, we propose NLabel to conduct familial clustering based on AV engines' raw labels. On the one hand, NLabel uses word embedding techniques to capture the similarity among raw labels, transform the inconsistent labels of the same family into similar semantic representations, and mitigate the inconsistency at finer granularity. On the other hand, we propose a hierarchical family clustering method to boost the performance of large-scale data sets. Experimental results show that our method outperforms the SOTA.
更多
查看译文
关键词
business Anti-Virus engines,efficient family labeling method,aliases,majority voting,final family label,malware scale,automatic family labeling,large-scale weakly-labeled malware,accurate familial clustering framework,large-scale data sets,hierarchical family clustering method,inconsistent labels,raw labels,AV engines,NLabel,vulnerable manner,coarse-grained manner
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要