An insight on complexity measures and classification in microarray data

2015 International Joint Conference on Neural Networks (IJCNN)(2015)

引用 14|浏览4
暂无评分
摘要
Microarray data classification has been typically seen as a difficult challenge for machine learning researchers mainly due to its high dimension in feature while sample size is small. However, this type of data presents other complications such as overlapping between classes, dataset shift, class imbalance, non-linearity, or features extracted under extremely different distributions. This paper intends to analyze in depth the theoretical complexity of several popular binary datasets, by making use of complexity measures, and then connecting it with the empirical results obtained by four widely-used classifiers. Two different situations are covered: datasets with only training set and datasets originally divided into training and test sets. In both cases it is demonstrated that there exists a correlation between the complexity measures and the actual error rates, which can facilitate in the future how to deal with a given dataset. Finally, we present a case study on Prostate dataset, improving the test classification accuracy from 53% to 97%.
更多
查看译文
关键词
complexity measures,microarray data classification,machine learning,theoretical complexity,binary datasets,classifiers,training set,test sets,error rates,Prostate dataset,test classification accuracy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要