Phishing Websites Detection Via Cnn And Multi-Head Self-Attention On Imbalanced Datasets

COMPUTERS & SECURITY(2021)

引用 24|浏览48
暂无评分
摘要
Phishing websites belong to a social engineering attack where perpetrators fake legitimate websites to lure people to access so as to illegally acquire user's identity, password, privacy and even properties. This attack imposes a great threat to people and becomes more and more severe. In order to identify phishing websites, many proposals have shown their merits. For example, the classical proposal CNN-LSTM received a very high precision by combining Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) together. However, despite CNN achieved great success in AI area, LSTM still exists the biases issue since it always treats the later features much more important than the former ones. In the meanwhile, as the self-attention mechanism can discover the text's inner dependency relationships, it has been widely applied to various tasks of deep learning-based Natural Language Processing (NLP). If we treat a URL as a text string, this mechanism can learn comprehensive URL representations. In order to improve the accuracy for phishing websites detection further, in this paper, we propose a novel Convolutional Neural Network (CNN) with self-attention named self-attention CNN for phishing Uniform Resource Locators (URLs) identification. Specifically, self-attention CNN first leverages Generative Adversarial Network (GAN) to generate phishing URLs so as to balance the datasets of legitimate and phishing URLs. Then it utilizes CNN and multi-head self-attention to construct our new classifier which is comprised of four blocks, namely the input block, the attention block, the feature block and the output block. Finally, the trained classifier can give a high-accuracy result for an unknown website URL. Overall thorough experiments indicate that self-attention CNN achieves 95.6% accuracy, which outperforms CNN-LSTM, single CNN and single LSTM by 1.4%, 4.6% and 2.1% respectively. (c) 2021 Elsevier Ltd. All rights reserved.
更多
查看译文
关键词
Phishing websites detection, Imbalanced dataset, Generative adversarial network, Convolutional neural network, Multi-head self-attention
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要