CTDroid: Leveraging a Corpus of Technical Blogs for Android Malware Analysis

IEEE Transactions on Reliability(2020)

引用 25|浏览72
暂无评分
摘要
The rapid growth of Android malware results in a large body of approaches devoted to malware analysis by leveraging machine learning algorithms. However, the effectiveness of these approaches primarily depends on the manual feature engineering process, which is time-consuming and labor-intensive based on expert knowledge and intuition. In this paper, we propose an automatic approach that engineers informative features from a corpus of Android malware related technical blogs, which are written in a way that mirrors the human feature engineering process. However, there are two main challenges. First, it is difficult to recognize useful knowledge in the magnanimity information of thousands of blogs. To this end, we leverage natural language processing techniques to process the blogs and extract a set of sensitive behaviors that might do harmful activities to users potentially. Second, there exists a semantic gap between the extracted sensitive behaviors and the programming language. To this end, we propose two semantic matching rules to match the behaviors with concrete code snippets such that the apps can be tested experimentally. We design and implement a system called CTDroid for malware analysis, including malware detection (MD) and familial classification (FC). After the evaluation of CTDroid on a large scale of real malware and benign apps, the experimental results demonstrate that CTDroid can achieve 95.8% true positive rate with only 1% false positive rate for MD and 97.9% accuracy for FC. Furthermore, our proposed features are more informative than those of state-of-the-art approaches.
更多
查看译文
关键词
Malware,Blogs,Semantics,Feature extraction,Machine learning algorithms,Knowledge engineering,Smart phones
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要