Phishing URL detection generalisation using Unsupervised Domain Adaptation

Fariza Rashid, Ben Doyle,Soyeon Caren Han,Suranga Seneviratne

Computer Networks(2024)

引用 0|浏览0
暂无评分
摘要
Phishing attacks are a prevailing problem in cybersecurity. In many data breaches, the initial entry can be traced back to phishing. URL-based phishing detection is one of the many ways of phishing attempt detection where solely the properties of the URLs are used to decide whether a given URL is phishing or not. While there are multiple existing works that use machine learning and deep learning to detect phishing URLs, in this paper, we show that such methods lack generalisation (i.e., they work effectively only when the test sets are split from the same training dataset). This is a significant issue since the vast majority of phishing attempts are short-lived and use freshly created domain names. Also, many network vantage points and middleboxes record URLs in slightly different formats and as such, URL data collected at various companies may be different. To address this, we propose an Unsupervised Domain Adaptation-based framework to increase the model transferability between datasets. We evaluate our approach using three datasets and show that the increase in cross-dataset F1 score performance is 0.06 on average and in some cases approximately as high as 0.2.
更多
查看译文
关键词
Phishing detection,Unsupervised domain adaptation,URL classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要