A sparse logistic mixture model for disease subtyping with clinical and genetic data

HAL (Le Centre pour la Communication Scientifique Directe)(2019)

引用 0|浏览1
暂无评分
摘要
This work proposes an original method for disease subtyping from both longitudinal clinical variables and genetic markers via a mixture of regressions model, with logistic weights function of a potentially large number of genetic variables. In order to address these large-scale problems, variable selection is an essential step. We thus propose to discard genetic variables that may not be relevant for clustering by maximizing a penalized likelihood via a Classification Expectation Maximization algorithm. The proposed method is validated on simulations. The approach is applied to a data set from a cohort of Parkinson's disease patients. Several subtypes of the disease as well as genetic variants potentially having a role in this typology have been identified. Identifying new genetic associations in non-Mendelian complex diseases is an increasingly difficult challenge. Yet, these diseases seem to have a significant part of heritability to explain. This missing heritability could be explained by the existence of subtypes involving different genetic factors. Taking genetic information into account in clinical trials can therefore be of interest to guide the process of subtyping a complex disease. Most methods dealing with multiple sources of information rely on data transformation, with two main tendencies regarding disease subtyping in that situation: i) the clustering of clinical data followed with posterior genetic analyzes and ii) the clustering of clinical and genetic variables. Both face limitations that we propose to leverage. This work proposes an original method for disease subtyping from both longitudinal clinical variables and high-dimensionnal genetic markers via a sparse mixture of regressions model. The added value of our approach lies in its interpretability regarding two aspects. First, our model links both clinical and genetic data with regard to their respective initial nature (i. e. without transformation) and does not need post-processing to come back to the original information to interpret the subtypes. Also, it can adress large-scale problems thanks to a variable selection step to discard genetic variables that may not be relevant for subtyping. The proposed method is validated on simulations. A dataset from a cohort of Parkinson's disease patients was also analyzed. Several subtypes of the disease as well as genetic variants having potentially a role in this typology have been identified.
更多
查看译文
关键词
sparse logistic mixture model,disease,clinical
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要