Can classification performance be predicted by complexity measures? A study using microarray data

Knowl. Inf. Syst.(2016)

引用 46|浏览6
暂无评分
摘要
Data complexity analysis enables an understanding of whether classification performance could be affected, not by algorithm limitations, but by intrinsic data characteristics. Microarray datasets based on high numbers of gene expressions combined with small sample sizes represent a particular challenge for machine learning researchers. This type of data also has other particularities that may negatively affect the generalization capacity of classifiers, such as overlaps between classes and class imbalance. Making use of several complexity measures, we analyzed the intrinsic complexity of several microarray datasets with and without feature selection and then explored the connection with the empirical results obtained by four widely used classifiers. Experimental results for 21 binary and multiclass datasets demonstrate that a correlation exists between microarray data complexity and the classification error rates.
更多
查看译文
关键词
Data complexity measures,Classification,Microarray data,Feature selection,Filters
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要