Fast in-memory cluster computing of sizeable microarray using spark

2016 International Conference on Recent Trends in Information Technology (ICRTIT)(2016)

引用 4|浏览21
暂无评分
摘要
Microarray technology is one of the emerging technologies in the field of genetic research, which many biologists often use to monitor expression levels of genes in a given organism. Microarray experiments are used to investigate genome-wide expression changes in health care aspects. The colossal amount of raw gene expression data often leads to computational and analytical challenges including feature selection and classification of the dataset into correct group or class. In this paper, mutual information feature selection method based on spark framework (sf-MI) is proposed to select the pertinent features. After feature selection, various classifiers i.e., Support Vector Machine (sf-SVM) and Logistic Regression (sf-LoR) based on Spark framework are applied to classify the microarray dataset. A detailed comparative analysis in terms of execution time and accuracy is enumerated on these feature selection and classifier methodologies that are based on Spark framework and conventional system respectively.
更多
查看译文
关键词
Big data,Hadoop,Spark,Microarray,Resilient Distributed Dataset,sf-LoR,sf-MI,sf-SVM
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要