Enhancing Selection of Alcohol Consumption Associated Genes by Random Forest.

The British journal of nutrition(2024)

引用 0|浏览0
暂无评分
摘要
Machine learning methods have been used in identifying omics markers for a variety of phenotypes. We aimed to examine whether a supervised machine learning algorithm can improve identification of alcohol-associated transcriptomic markers. In this study, we analyzed array-based, whole-blood derived expression data for 17,873 gene transcripts in 5,508 Framingham Heart Study participants. By using the Boruta algorithm, a supervised Random Forest (RF)-based feature selection method, we selected 25 alcohol-associated transcripts. In a testing set (30% of entire study participants), AUCs (area under the receiver operating characteristics curve) of these 25 transcripts were 0.73, 0.69, and 0.66 for nondrinkers vs. moderate drinkers, nondrinkers vs. heavy drinkers, and moderate drinkers vs. heavy drinkers, respectively. The AUCs of the selected transcripts by the Boruta method were comparable to those identified using conventional linear regression models, e.g., AUCs of 1,985 transcripts identified by conventional linear regression models (false discovery rate < 0.05) were 0.72, 0.68, and 0.68, respectively. With Bonferroni correction for the 25 Boruta method selected transcripts and three CVD risk factors (i.e., at P < 6.7e-4), we observed 13 transcripts were associated with obesity, 3 transcripts with type 2 diabetes, and 1 transcript with hypertension. For example, we observed that alcohol consumption was inversely associated with the expression of DOCK4, IL4R, and SORT1, and DOCK4 and SORT1 were positively associated with obesity and IL4R was inversely associated with hypertension. In conclusion, using a supervised machine learning method, the RF-based Boruta algorithm, we identified novel alcohol-associated gene transcripts.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要