Enabling Privacy-Preserving Cyber Threat Detection with Federated Learning
arxiv(2024)
摘要
Despite achieving good performance and wide adoption, machine learning based
security detection models (e.g., malware classifiers) are subject to concept
drift and evasive evolution of attackers, which renders up-to-date threat data
as a necessity. However, due to enforcement of various privacy protection
regulations (e.g., GDPR), it is becoming increasingly challenging or even
prohibitive for security vendors to collect individual-relevant and
privacy-sensitive threat datasets, e.g., SMS spam/non-spam messages from mobile
devices. To address such obstacles, this study systematically profiles the
(in)feasibility of federated learning for privacy-preserving cyber threat
detection in terms of effectiveness, byzantine resilience, and efficiency. This
is made possible by the build-up of multiple threat datasets and threat
detection models, and more importantly, the design of realistic and
security-specific experiments.
We evaluate FL on two representative threat detection tasks, namely SMS spam
detection and Android malware detection. It shows that FL-trained detection
models can achieve a performance that is comparable to centrally trained
counterparts. Also, most non-IID data distributions have either minor or
negligible impact on the model performance, while a label-based non-IID
distribution of a high extent can incur non-negligible fluctuation and delay in
FL training. Then, under a realistic threat model, FL turns out to be
adversary-resistant to attacks of both data poisoning and model poisoning.
Particularly, the attacking impact of a practical data poisoning attack is no
more than 0.14% loss in model accuracy. Regarding FL efficiency, a
bootstrapping strategy turns out to be effective to mitigate the training delay
as observed in label-based non-IID scenarios.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要