betaclust: a family of mixture models for beta valued DNA methylation data

arxiv(2022)

引用 0|浏览6
暂无评分
摘要
The DNA methylation process has been extensively studied for its role in cancer. Promoter cytosine-guanine dinucleotide (CpG) island hypermethylation has been shown to silence tumour suppressor genes. Identifying the differentially methylated CpG (DMC) sites between benign and tumour samples can help understand the disease. The EPIC microarray quantifies the methylation level at a CpG site as a beta value which lies within [0,1). There is a lack of suitable methods for modelling the beta values in their innate form. The DMCs are identified via multiple t-tests but this can be computationally expensive. Also, arbitrary thresholds are often selected and used to identify the methylation state of a CpG site. We propose a family of novel beta mixture models (BMMs) which use a model-based clustering approach to cluster the CpG sites in their innate beta form to (i) objectively identify methylation state thresholds and (ii) identify the DMCs between different samples. The family of BMMs employs different parameter constraints that are applicable to different study settings. Parameter estimation proceeds via an EM algorithm, with a novel approximation during the M-step providing tractability and computational feasibility. Performance of the BMMs is assessed through a thorough simulation study, and the BMMs are used to analyse a prostate cancer dataset and an esophageal squamous cell carcinoma dataset. The BMM approach objectively identifies methylation state thresholds and identifies more DMCs between the benign and tumour samples in both cancer datasets than conventional methods, in a computationally efficient manner. The empirical cumulative distribution function of the DMCs related to genes implicated in carcinogenesis indicates hypermethylation of CpG sites in the tumour samples in both cancer settings. An R package betaclust is provided to facilitate the use of the developed BMMs.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要