DSA: Scalable Distributed Sequence Alignment System Using SIMD Instructions

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)(2017)

引用 11|浏览38
暂无评分
摘要
Sequence alignment algorithms are a basic and critical component of many bioinformatics fields. With rapid development of sequencing technology, the fast growing reference database volumes and longer length of query sequence become new challenges for sequence alignment. However, the algorithm is prohibitively high in terms of time and space complexity. In this paper, we present DSA, a scalable distributed sequence alignment system that employs Spark to process sequences data in a horizontally scalable distributed environment, and leverages data parallel strategy based on Single Instruction Multiple Data (SIMD) instruction to parallelize the algorithm in each core of worker node. The experimental results demonstrate that 1) DSA has outstanding performance and achieves up to 201x speedup over SparkSW. 2) DSA has excellent scalability and achieves near linear speedup when increasing the number of nodes in cluster.
更多
查看译文
关键词
distributed sequence alignment,Apache Spark,SIMD instruction,Alluxio,Scalability
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要