Fuzziness based semi-supervised learning approach for intrusion detection system

Inf. Sci., Volume 378, Issue C, 2017, Pages 484-497.

Cited by: 306|Bibtex|Views174|Links
EI
Keywords:
SUPPORT VECTOR MACHINEFUNCTIONAL-LINK NETNEURAL-NETWORKSRANDOM WEIGHTSCLASSIFICATIONMore(5+)
Weibo:
In this paper we have designed a new supervised learning algorithm for improving the classifier performance on Intrusion detection datasets by investigating a divide-and-conquer strategy in which unlabeled samples with their predicted labels are categorized according to the magni...

Abstract:

Countering cyber threats, especially attack detection, is a challenging area of research in the field of information assurance. Intruders use polymorphic mechanisms to masquerade the attack payload and evade the detection techniques Many supervised and unsupervised learning approaches from the field of machine learning and pattern recogni...More

Code:

Data:

0
Introduction
  • Intrusion detection (ID) is a process of monitoring, detecting, and analyzing the events that are considered as violation to the security policies of a networked environment [45].
  • Denning [12] introduced the concept of detecting cyber-based attacks on computer networks by providing a framework for intrusion detection system (IDS), which is based on the hypothesis that security violations can be detected by monitoring system audit records for abnormal patterns of system usage
  • Organizations deploy their own access controls to grant or restrict the level of access for their assets but this approach.
  • According to Qui et al [40], Hernndez-Pereira et al [16] and Yan and Yu [56], intrusion can be depicted as the set of actions that attempt to compromise the confidentiality, integrity, or availability (CIA) of information resources; it is necessary to take different measures to minimize such risks
Highlights
  • Intrusion detection (ID) is a process of monitoring, detecting, and analyzing the events that are considered as violation to the security policies of a networked environment [45]
  • We train the classifier neural network with random weights with original training set Tr and obtain the three groups of samples i.e., low, mid, and high, which are based on fuzzy quantity after utilizing the unlabeled set U
  • The input weights wi and biases bi at the hidden layer of neural network with random weights were the random variables that followed a uniform distribution over the interval
  • In this paper we have designed a new supervised learning algorithm for improving the classifier performance on Intrusion detection datasets by investigating a divide-and-conquer strategy in which unlabeled samples with their predicted labels are categorized according to the magnitude of fuzziness
  • The study mentioned in this paper is limited to achieve the better classification accuracy after finding the relationship between the fuzziness outputted by the classifier on a group of samples and their misclassification rate
  • Our future research will be directed towards applying this strategy to improve the effectiveness of intrusion detection systems for detecting multiple types of attacks
Results
  • After required data preprocessing and necessary data scaling, the authors conduct the experiment in two modes as mentioned in Table 8 to evaluate the performance of the proposed methodology.

    Experiment-1: In the first experiment, the authors convert all symbolic attributes into indicator variables as discussed earlier.
  • The authors test the methodology and obtain the testing accuracy on both KDDTest+ and KDDTest−21 dataset.
  • The authors retrain the classifier with new training set Tr’, where the samples belonging to low and high fuzziness groups are incorporated with Tr. In second phase, the authors retrain the classifier with new training set Tr’, where the samples belonging to low and high fuzziness groups are incorporated with Tr
  • Testing accuracies on both datasets are depicted in Table 9.
  • A smaller interval, i.e., [0,1], leads to better accuracy as shown in Fig. 5(a)
Conclusion
  • In this paper the authors have designed a new SSL algorithm for improving the classifier performance on ID datasets by investigating a divide-and-conquer strategy in which unlabeled samples with their predicted labels are categorized according to the magnitude of fuzziness.
  • It is experimentally observed that this methodology is an effective way to improve the classification accuracy when the authors train the NNRw to get the fuzzy vector output and perform the sample categorization on unlabeled samples according to their fuzziness quantity.
  • The authors' future research will be directed towards applying this strategy to improve the effectiveness of IDSs for detecting multiple types of attacks
Summary
  • Introduction:

    Intrusion detection (ID) is a process of monitoring, detecting, and analyzing the events that are considered as violation to the security policies of a networked environment [45].
  • Denning [12] introduced the concept of detecting cyber-based attacks on computer networks by providing a framework for intrusion detection system (IDS), which is based on the hypothesis that security violations can be detected by monitoring system audit records for abnormal patterns of system usage
  • Organizations deploy their own access controls to grant or restrict the level of access for their assets but this approach.
  • According to Qui et al [40], Hernndez-Pereira et al [16] and Yan and Yu [56], intrusion can be depicted as the set of actions that attempt to compromise the confidentiality, integrity, or availability (CIA) of information resources; it is necessary to take different measures to minimize such risks
  • Results:

    After required data preprocessing and necessary data scaling, the authors conduct the experiment in two modes as mentioned in Table 8 to evaluate the performance of the proposed methodology.

    Experiment-1: In the first experiment, the authors convert all symbolic attributes into indicator variables as discussed earlier.
  • The authors test the methodology and obtain the testing accuracy on both KDDTest+ and KDDTest−21 dataset.
  • The authors retrain the classifier with new training set Tr’, where the samples belonging to low and high fuzziness groups are incorporated with Tr. In second phase, the authors retrain the classifier with new training set Tr’, where the samples belonging to low and high fuzziness groups are incorporated with Tr
  • Testing accuracies on both datasets are depicted in Table 9.
  • A smaller interval, i.e., [0,1], leads to better accuracy as shown in Fig. 5(a)
  • Conclusion:

    In this paper the authors have designed a new SSL algorithm for improving the classifier performance on ID datasets by investigating a divide-and-conquer strategy in which unlabeled samples with their predicted labels are categorized according to the magnitude of fuzziness.
  • It is experimentally observed that this methodology is an effective way to improve the classification accuracy when the authors train the NNRw to get the fuzzy vector output and perform the sample categorization on unlabeled samples according to their fuzziness quantity.
  • The authors' future research will be directed towards applying this strategy to improve the effectiveness of IDSs for detecting multiple types of attacks
Tables
  • Table1: Fuzziness based divide-and-conquer methodology
  • Table2: Proposed algorithm using NNRw
  • Table3: Description of input features
  • Table4: Attack types
  • Table5: Symbolic features with values
  • Table6: Clustering of flag feature
  • Table7: Clustering of service feature
  • Table8: List of experiments
  • Table9: Testing accuracy on KDDTest+ and KDDTest−21
  • Table10: Testing accuracy on KDDTest+ and KDDTest−21 dataset after incorporating other fuzziness groups into the training set
  • Table11: Table 11
  • Table12: Table 12
  • Table13: Performance comparison between different classifiers and proposed algorithm
Download tables as Excel
Funding
  • The authors would like to extend their sincere appreciation to the Deanship of Scientific Research at King Saud University for its funding of this research through the Research Group Project no
  • This research is also supported by China Postdoctoral Science Foundation (2015M572361), Basic Research Project of Knowledge Innovation Program in Shenzhen (JCYJ20150324140036825), and National Natural Science Foundations of China (61503252 and 71371063)
Reference
  • D. Aha, D. Kibler, M. Albert, Instance-based learning algorithms, Mach. Learn. 6 (1) (1991) 37–66.
    Google ScholarLocate open access versionFindings
  • M. Alhamdoosh, D.H. Wang, Fast decorrelated neural network ensembles with random weights, Inf. Sci. 264 (2014) 104–117.
    Google ScholarLocate open access versionFindings
  • S. Baluja, Using labeled and unlabeled data for probabilistic modeling of face orientation, Int. J. Pattern Recognit. Artif. Intell. 14 (08) (2000) 1097–1107.
    Google ScholarLocate open access versionFindings
  • A. Blum, S. Chawla, Learning from labeled and unlabeled data using graph mincuts, in: Proceedings of the Eighteenth International Conference on Machine Learning, 2001, pp. 19–26.
    Google ScholarLocate open access versionFindings
  • A. Blum, T. Mitchell, Combining labeled and unlabeled data with co-training, in: Proceedings of the Eleventh Annual Conference on Computational Learning Theory - COLT’ 98, 1998, pp. 92–100.
    Google ScholarLocate open access versionFindings
  • S. Bosworth, M. Kabay, Computer Security Handbook, John Wiley & Sons, New York, 2002.
    Google ScholarFindings
  • F.L. Cao, H.L. Ye, D.H. Wang, A probabilistic learning algorithm for robust modeling using neural networks with random weights, Inf. Sci. 313 (2015)
    Google ScholarLocate open access versionFindings
  • C. Chen, Y. Gong, Y. Tian, Semi-supervised learning methods for network intrusion detection, in: Proceedings of the 2008 IEEE International Conference on Systems, Man and Cybernetics, 2008, pp. 2603–2608.
    Google ScholarLocate open access versionFindings
  • W. Chen, Y. Shao, N. Hong, Laplacian smooth twin support vector machine for semi-supervised classification, Int. J. Mach. Learn. Cybern. 5 (3) (2013)
    Google ScholarLocate open access versionFindings
  • J. Chen, Y. Wang, X. Wang, On-demand security architecture for cloud computing, Computer 45 (7) (2012) 73–78.
    Google ScholarLocate open access versionFindings
  • A. De Luca, S. Termini, A definition of a nonprobabilistic entropy in the setting of fuzzy sets theory, Inf. Control 20 (4) (1972) 301–312.
    Google ScholarLocate open access versionFindings
  • D. Denning, An intrusion-detection model, IEEE Trans. Softw. Eng. 13 (2) (1987) 222–232.
    Google ScholarLocate open access versionFindings
  • A. Fujino, N. Ueda, K. Saito, A hybrid generative/discriminative classifier design for semi-supervised learning, Trans. Jpn. Soc. Artif. Intell. 21 (2006)
    Google ScholarLocate open access versionFindings
  • Q. Gao, Y. Huang, X. Gao, W. Shen, H. Zhang, A novel semi-supervised learning for face recognition, Neurocomputing 152 (2015) 69–76.
    Google ScholarFindings
  • Y. He, X.Z. Wang, J.Z.X. Huang, Fuzzy nonlinear regression analysis using a random weight network, Inf. Sci. (2016) In press, doi:10.1016/j.ins.2016.01.
    Locate open access versionFindings
  • E. Hernndez-Pereira, J. Surez-Romero, O. Fontenla-Romero, A. Alonso-Betanzos, Conversion methods for symbolic features: a comparison applied to an intrusion detection problem, Expert Syst. Appl. 36 (7) (2009) 10612–10617.
    Google ScholarLocate open access versionFindings
  • B. Igelnik, Y.-H. Pao, Stochastic choice of basis functions in adaptive function approximation and the functional-link net, IEEE Trans. Neural Netw. 6 (6) (1995) 1320–1329.
    Google ScholarLocate open access versionFindings
  • KDDCup 1999 Data, 2015. [Online]. Available: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.[Accessed:23-06-15].
    Findings
  • J. Keller, M. Gray, J. Givens, A fuzzy k-nearest neighbor algorithm, IEEE Trans. Syst. Man Cybern. 15 (4) (1985) 580–585.
    Google ScholarLocate open access versionFindings
  • C. Kruegel, F. Valeur, G. Vigna, Intrusion Detection and Correlation, Springer, New York, 2005.
    Google ScholarFindings
  • T. Lane, A decision-theoretic, semi-supervised model for intrusion detection, part of the series advanced information and knowledge processing, Machine Learning and Data Mining for Computer Security: Methods and Applications, Springer, London, 2006, pp. 157–177, doi:10.1007/ 1- 84628- 253- 5_10.
    Findings
  • P. Laskov, P. Dssel, C. Schfer, K. Rieck, Learning intrusion detection: Supervised or unsupervised? in: Proceedings of the Conference on Image Analysis and Processing ICIAP 2005, 2005, pp. 50–57.
    Google ScholarLocate open access versionFindings
  • C. Lee, D.A. Landgrebe, Decision boundary feature extraction for neural networks, IEEE Trans. Neural Netw. 8 (1) (1997) 75–83.
    Google ScholarLocate open access versionFindings
  • Y. Liao, V. Vemuri, Use of k-nearest neighbor classifier for intrusion detection, Comput. Secur. 21 (5) (2002) 439–448.
    Google ScholarLocate open access versionFindings
  • M. Luo, L. Wang, H. Zhang, J. Chen, A Research on intrusion detectionbased on unsupervised clustering and support vector machine, in: Proceedings of Information and Communications Security: 5th International Conference, ICICS 2003, Huhehaote, China, October 10–13, 2003 (series Lecture Notes in Computer Science), vol. 2836, Springer, Berlin Heidelberg, 2003, pp. 325–336, doi:10.1007/978-3-540-39927-8_30.
    Locate open access versionFindings
  • A. Mahmood, T. Li, Y. Yang, H. Wang, M. Afzal, Semi-supervised evolutionary ensembles for web video categorization, Knowl. Based Syst. 76 (2015) 53–66.
    Google ScholarLocate open access versionFindings
  • U. Maulik, D. Chakraborty, A novel semisupervised SVM for pixel classification of remote sensing imagery, Int. J. Mach. Learn. Cybern. 3 (3) (2011) 247–258.
    Google ScholarLocate open access versionFindings
  • M. Meng, J. Wei, J. Wang, Q. Ma, X. Wang, Adaptive semi-supervised dimensionality reduction based on pairwise constraints weighting and graph optimizing, Int. J. Mach. Learn. Cybern. (2015) in press, doi:10.1007/s13042-015-0380-3 (accessed 19.08.15).
    Locate open access versionFindings
  • S. Mukkamala, G. Janoski, A. Sung, Intrusion detection using neural networks and support vector machines, in: Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN02 (Cat. No.02CH37290), vol. 2, 2002, p. 17021707.
    Google ScholarLocate open access versionFindings
  • S. Mukkamala, A. Sung, Detecting denial of service attacks using support vector machines, in: Proceedings of the Twelfth IEEE International Conference on Fuzzy Systems, 2003.
    Google ScholarLocate open access versionFindings
  • S. Mukkamala, A. Sung, A. Abraham, Intrusion detection using an ensemble of intelligent paradigms, J. Netw. Comput. Appl. 28 (2005) 167. Science Direct
    Google ScholarLocate open access versionFindings
  • J. Neter, Applied Linear Statistical Models, WCB/MacGraw-Hill, Boston, 1996.
    Google ScholarLocate open access versionFindings
  • K. Nigam, R. Ghani, Analyzing the effectiveness and applicability of co-training, in: Proceeding of the Ninth International Conference on Information and Knowledge (CIKM-2000), 2000.
    Google ScholarLocate open access versionFindings
  • K. Nigam, A. Mccallum, S. Thrun, T. Mitchell, Text classification from labeled and unlabeled documents using EM, Mach. Learn. 39 (23) (2000) 103–134.
    Google ScholarLocate open access versionFindings
  • NSL-KDD Data Set, [Online]. Available: http://nsl.cs.unb.ca/NSL-KDD/.[Accessed:23-06-15].
    Findings
  • B. Pan, J. Lai, L. Shen, Ideal regularization for learning kernels from labels, Neural Netw. 56 (2014) 22–34.
    Google ScholarLocate open access versionFindings
  • F. Pan, J. Wang, X. Lin, Local margin based semi-supervised discriminant embedding for visual recognition, Neurocomputing 74 (5) (2011) 812–819.
    Google ScholarLocate open access versionFindings
  • Y. Pao, G. Park, D. Sobajic, Learning and generalization characteristics of the random vector functional-link net, Neurocomputing 6 (2) (1994) 163–180.
    Google ScholarLocate open access versionFindings
  • Z. Qi, Y. Tian, L. Niu, B. Wang, Semi-supervised classification with privileged information, Int. J. Mach. Learn. Cybern. 6 (4) (2015) 667–676.
    Google ScholarLocate open access versionFindings
  • M. Qiu, L. Zhang, Z. Ming, Z. Chen, X. Qin, L. Yang, Security-aware optimization for ubiquitous computing systems with SEAT graph approach, J. Comput. Syst. Sci. vol. 79 (5) (2013) 518–529.
    Google ScholarLocate open access versionFindings
  • E. Riloff, J. Wiebe, T. Wilson, Learning subjective nouns using extraction pattern bootstrapping, in: Proceedings of the Seventh conference on Natural language learning at HLT-NAACL 2003, vol. 4, 2003, pp. 25–32.
    Google ScholarLocate open access versionFindings
  • C. Rosenberg, M. Hebert, H. Schneiderman, Semi-supervised self-training of object detection models, in: Proceedings of the 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION’05), 1, 2005, pp. 29–36.
    Google ScholarLocate open access versionFindings
  • D. Sanchez, E. Trillas, Measures of fuzziness under different uses of fuzzy sets, Commun. Comput. Inf. Sci. 298 (2012) 25–34.
    Google ScholarLocate open access versionFindings
  • S. Scardapane, D.H. Wang, M. Panella, A. Uncini, Distributed learning for random vector functional-link networks, Inf. Sci. 301 (2015) 271–284.
    Google ScholarLocate open access versionFindings
  • K. Scarfone, P. Mell, 2007, SP 800–94. Guide to Intrusion Detection and Prevention Systems (IDPS). National Institute of Standards & Technology, Gaithersburg, MD, United States.
    Google ScholarFindings
  • W. Schmidt, M. Kraaijveld, R. Duin, Feedforward neural networks with random weights, in: Proceedings of the Eleventh IAPR International Conference on Pattern Recognition, Conference B: Pattern recognition Methodology and Systems, 1992, pp. 1–4.
    Google ScholarLocate open access versionFindings
  • C. Shang, S. Feng, Z. Zhao, J. Fan, Efficiently detecting overlapping communities using seeding and semi-supervised learning, Int. J. Mach. Learn. Cybern. (2015) in press, doi:10.1007/s13042-015-0338-5 (accessed 19.08.15).
    Locate open access versionFindings
  • M. Tavallaee, E. Bagheri, W. Lu, A. Ghorbani, A detailed analysis of the KDD CUP 99 data set, in: Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, 2009.
    Google ScholarLocate open access versionFindings
  • H.T. Braake, G.V. Straten, Random activation weight neural net (RAWN) for east non-iterative training, Eng. Appl. Artif. Intell. 8 (1) (1995) 71–80.
    Google ScholarLocate open access versionFindings
  • V. Vapnik, Statistical Learning Theory, Wiley, New York, 1998.
    Google ScholarFindings
  • X. Wang, R.A.R. Ashfaq, A. Fu, Fuzziness based sample categorization for classifier performance improvement, J. Intell. Fuzzy Syst. 29 (3) (2015) 1185–1196.
    Google ScholarLocate open access versionFindings
  • H. Wang, R. Nie, X. Liu, T. Li, Constraint projections for semi-supervised affinity propagation, Knowl. Based Syst. 36 (2012) 315–321.
    Google ScholarFindings
  • X. Wang, H.J. Xing, Y. Li, Q. Hua, C.R. Dong, W. Pedrycz, A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning, IEEE Trans. Fuzzy Syst. 23 (5) (2015) 1638–1654.
    Google ScholarLocate open access versionFindings
  • J. Xie, K. Hone, W. Xie, X. Gao, Y. Shi, X. Liu, Extending twin support vector machine classifier for multi-category classification problems, Intell. Data Anal. 17 (4) (2013) 649–664.
    Google ScholarLocate open access versionFindings
  • Y. Yam, T. Chow, C. Leung, A new method in determining initial weights of feedforward neural networks for training enhancement, Neurocomputing 16 (1) (1997) 23–32.
    Google ScholarLocate open access versionFindings
  • Q. Yan, F. Yu, Distributed denial of service attacks in software-defined networking with cloud computing, IEEE Commun. Mag. 53 (4) (2015) 52–59.
    Google ScholarLocate open access versionFindings
  • D. Yarowsky, Unsupervised word sense disambiguation rivaling supervised methods, in: Proceedings of the Thirty Third Annual Meeting on Association for Computational Linguistics, 1995, pp. 189–196.
    Google ScholarLocate open access versionFindings
  • Z. You, J.Z. Yu, L. Zhu, S. Li, Z.K. Wen, A mapreduce based parallel SVM for large-scale predicting proteinprotein interactions, Neurocomputing 145 (2014) 37–43.
    Google ScholarFindings
  • Y. Yuan, M.J. Shaw, Induction of fuzzy decision trees, Fuzzy Sets Syst. 69 (1995) 125–139.
    Google ScholarLocate open access versionFindings
  • L. Zadeh, Probability measures of fuzzy events, J. Math. Anal. Appl. 23 (2) (1968) 421–427.
    Google ScholarLocate open access versionFindings
  • J.W. Zhao, Z.H. Wang, F.L. Cao, D.H. Wang, A local learning algorithm for random weights networks, Knowl. Based Syst. 74 (2015) 159–166.
    Google ScholarLocate open access versionFindings
  • M. Zhao, T. Chow, Z. Wu, Z. Zhang, B. Li, Learning from normalized local and global discriminative information for semi-supervised regression and dimensionality reduction, Inf. Sci. 324 (2015) 286–309.
    Google ScholarLocate open access versionFindings
  • D. Zhou, J. Huang, B. Schlkopf, Learning from labeled and unlabeled data on a directed graph, in: Proceedings of the Twenty Second International Conference on Machine Learning - ICML ’05, 2005, pp. 1036–1043.
    Google ScholarLocate open access versionFindings
  • X. Zhu, Semi-Supervised Learning Literature Survey, Computer Sciences Technical Report 1530, University of WisconsinMadison, 2005.
    Google ScholarFindings
  • X. Zhu, A. Goldberg, Introduction to semi-supervised learning, Synth. Lect. Artif. Intell. Mach. Learn. 3 (1) (2009) 1–130.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments