Accelerating Data-Intensive Genome Analysis in the Cloud

Nabeel M Mohamed,Heshan Lin, Wu-chun Feng

semanticscholar(2013)

引用 5|浏览4
暂无评分
摘要
Next-generation sequencing (NGS) technologies have made it possible to rapidly sequence the human genome, heralding a new era of health-care innovations based on personalized genetic information. However, these NGS technologies generate data at a rate that far outstrips Moore’s Law. Consequently, analyzing this exponentially increasing data deluge requires enormous computational and storage resources, resources that many life science institutions do not have access to. As such, cloud computing has emerged as an obvious, but still nascent, solution. In this paper, we present SeqInCloud, our highly scalable implementation of a genome analysis pipeline on the Microsoft Hadoop on Azure (HoA) public cloud. Together with a parallel implementation of GATK on Hadoop, we evaluate the potential of using cloud computing for large-scale DNA analysis and present a detailed study on efficiently utilizing cloud resources for data-intensive, life-science applications with SeqInCloud.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要