Hybrid One-Class Ensemble For High-Dimensional Data Classification
Intelligent Information and Database Systems, ACIIDS 2016, Pt II(2016)
摘要
The advance of high-throughput techniques, such as gene microarrays and protein chips have a major impact on contemporary biology and medicine. Due to the high-dimensionality and complexity of the data, it is impossible to analyze it manually. Therefore machine learning techniques play an important role in dealing with such data. In this paper we propose to use a one-class approach to classifying microarrays. Unlike canonical classifiers, these models rely only on objects coming from single class distributions. They distinguish observations coming from the given class from any other possible states of the object, that were unseen during the classification step. While having less information to dichotomize between classes, one-class models can easily learn the specific properties of a given dataset and are robust to difficulties embedded in the nature of the data. We show, that using one-class ensembles can give as good results as canonical multi-class classifiers, while allowing to deal with imbalanced distribution and unexpected noise in the data. To cope with high dimensionality of the feature space, we propose a novel hybrid one-class ensemble utilizing combination of weighted Bagging and Random Subspaces. Experimental investigations, carried on public datasets, prove the usefulness of the proposed approach.
更多查看译文
关键词
Machine learning, One-classclassification, Classifier ensemble, High-dimensional data, Bioinformatics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络