M3C: A Monte Carlo reference-based consensus clustering algorithm

bioRxiv(2019)

引用 8|浏览12
暂无评分
摘要
Genome-wide data is used to stratify patients into classes for precision medicine using class discovery algorithms. A widely applied method is consensus clustering; however, the approach is prone to overfitting and identification of false positives. These problems arise from not considering null reference distributions while selecting the number of classes (K). As a solution, we developed a reference-based consensus clustering algorithm called Monte Carlo consensus clustering (M3C). M3C uses a Monte Carlo simulation to generate null distributions along the range of K, which are used to decide its value and reject the null hypothesis. The M3C method clearly removes the limitations of consensus clustering as demonstrated in both simulations and investigation of The Cancer Genome Atlas expression data. M3C can quantify structural relationships between clusters and uses self-tuning spectral clustering to analyse complex structures. In parallel, we developed clusterlab, a flexible Gaussian cluster simulator to test class discovery tools. Clusterlab can simulate high dimensional Gaussian clusters with precise control over spacing, variance, and size. This computational framework should prove useful in the development of precision medicine.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要