Variable selection with FDR control for noisy data -- an application to screening metabolites that are associated with breast and colorectal cancer

arXiv (Cornell University)(2023)

引用 0|浏览26
暂无评分
摘要
The rapidly expanding field of metabolomics presents an invaluable resource for understanding the associations between metabolites and various diseases. However, the high dimensionality, presence of missing values, and measurement errors associated with metabolomics data can present challenges in developing reliable and reproducible methodologies for disease association studies. Therefore, there is a compelling need to develop robust statistical methods that can navigate these complexities to achieve reliable and reproducible disease association studies. In this paper, we focus on developing such a methodology with an emphasis on controlling the False Discovery Rate during the screening of mutual metabolomic signals for multiple disease outcomes. We illustrate the versatility and performance of this procedure in a variety of scenarios, dealing with missing data and measurement errors. As a specific application of this novel methodology, we target two of the most prevalent cancers among US women: breast cancer and colorectal cancer. By applying our method to the Wome's Health Initiative data, we successfully identify metabolites that are associated with either or both of these cancers, demonstrating the practical utility and potential of our method in identifying consistent risk factors and understanding shared mechanisms between diseases.
更多
查看译文
关键词
screening metabolites,variable selection,fdr control,noisy data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要