On the Cross-Dataset Generalization of Machine Learning for Network Intrusion Detection
CoRR(2024)
摘要
Network Intrusion Detection Systems (NIDS) are a fundamental tool in
cybersecurity. Their ability to generalize across diverse networks is a
critical factor in their effectiveness and a prerequisite for real-world
applications. In this study, we conduct a comprehensive analysis on the
generalization of machine-learning-based NIDS through an extensive
experimentation in a cross-dataset framework. We employ four machine learning
classifiers and utilize four datasets acquired from different networks:
CIC-IDS-2017, CSE-CIC-IDS2018, LycoS-IDS2017, and LycoS-Unicas-IDS2018.
Notably, the last dataset is a novel contribution, where we apply corrections
based on LycoS-IDS2017 to the well-known CSE-CIC-IDS2018 dataset. The results
show nearly perfect classification performance when the models are trained and
tested on the same dataset. However, when training and testing the models in a
cross-dataset fashion, the classification accuracy is largely commensurate with
random chance except for a few combinations of attacks and datasets. We employ
data visualization techniques in order to provide valuable insights on the
patterns in the data. Our analysis unveils the presence of anomalies in the
data that directly hinder the classifiers capability to generalize the learned
knowledge to new scenarios. This study enhances our comprehension of the
generalization capabilities of machine-learning-based NIDS, highlighting the
significance of acknowledging data heterogeneity.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要