Hierarchical cluster analysis of SAGE data for cancer profiling.

BIOKDD'01: Proceedings of the 1st International Conference on Data Mining in Bioinformatics(2001)

引用 47|浏览27
暂无评分
摘要
In this paper we present a method for clustering SAGE (Serial Analysis of Gene Expression) data to detect similarities and dissimilarities between different types of cancer on the sub-cellular level. The data, however, is extremely high dimensional, and due to the method of measurement, there are many errors as well as missing values in the data, challenging any clustering algorithm. Therefore, we introduce special pre-processing techniques to reduce these errors and to restore missing data. These techniques are tailored to the process that generates the data, making only very conservative changes. Furthermore, we present a new subspace selection technique to identify a relevant subset of attributes (genes) using the Wilcoxon test. This is a general technique that can be applied to select subspaces for the purpose of clustering whenever some high-level categories of interest are known for the data (such as cancerous and non-cancerous). Finally, we discuss the results of the application of the clustering algorithm OPTICS to the SAGE data, before and after our preprocessing steps.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要