Predicting likelihood of legitimate data loss in email DLP

Future Generation Computer Systems(2020)

引用 16|浏览8
暂无评分
摘要
The volume and variety of data collected for modern organisations has increased significantly over the last decade necessitating the detection and prevention of disclosure of sensitive data. Data loss prevention is an embedded process used to protect against disclosure of sensitive data to external uncontrolled environments. A typical Data Loss Prevention (DLP) system uses custom policies to identify and prevent accidental and malicious data leakage producing large number of security alerts including significant volume of false positives. Consequently, identifying legitimate data loss can be very challenging as each incident comprises of different characteristics often requiring extensive intervention by a domain expert to review alerts individually. This limits the ability to detect data loss alerts in real-time making organisations vulnerable to financial and reputational damages. The aim of this research is to strengthen data loss detection capabilities of a DLP system by implementing a machine learning model to predict the likelihood of legitimate data loss. We conducted extensive experimentation using Decision Tree and Random Forest algorithms with historical email incident data collected by a globally established telecommunication enterprise. The final model produced with Random Forest algorithm was identified as the most effective as it was successfully able to predict approximately 95% data loss incidents accurately with an average true positive value of 90%. Furthermore, the proposed solution successfully enables identification of legitimate data loss in email DLP whilst facilitating prioritisation of real data loss through human-understandable explanation of the decision thereby improving the efficiency of the process.
更多
查看译文
关键词
Data loss prevention,Email DLP,Insider threats,Threat prediction,Machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要