Machine Learning Approaches For The Prediction Of Obesity Using Publicly Available Genetic Profiles

2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)(2017)

引用 39|浏览17
暂无评分
摘要
This paper presents a novel approach based on the analysis of genetic variants from publicly available genetic profiles and the manually curated database, the National Human Genome Research Institute Catalog. Using data science techniques, genetic variants are identified in the collected participant profiles and then indexed as risk variants in the National Human Genome Research Institute Catalog. Indexed genetic variants or Single Nucleotide Polymorphisms are used as inputs in various machine learning algorithms for the prediction of obesity. Body mass index status of participants is divided into two classes, Normal Class and Risk Class. Dimensionality reduction tasks are performed to generate a set of principal variables - 13 SNPs - for the application of various machine learning methods. The models are evaluated using receiver operator characteristic curves and the area under the curve. Machine learning techniques including gradient boosting, generalized linear model, classification and regression trees, k-nearest neighbours, support vector machines, random forest and multilayer perceptron neural network are comparatively assessed in terms of their ability to identify the most important factors among the initial 6622 variables describing genetic variants, age and gender, to classify a subject into one of the body mass index related classes defined in this study. Our simulation results indicated that support vector machine generated the highest area under the curve value of 90.5%.
更多
查看译文
关键词
Data Science, Feature Selection, Genetics, Machine Learning, Obesity, Receiver Operating Characteristic Curve, Single Nucleotide Polymorphisms
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要