Singleton Sieving: Overcoming the Memory/Speed Trade-Off in Exascale -mer Analysis.

Hunter McCoy,Steven A. Hofmeyr, Katherine A. Yelick,Prashant Pandey

ACDA(2023)

引用 0|浏览0
暂无评分
摘要
Traditional filter data structures, such as Bloom filters, do not offer necessary features that modern high-performance data analytics applications need in order to efficiently perform complex data analysis tasks. For example, MetaHipMer, a de novo metagenome assembler, can use filters to weed out singleton κ-mers and reduce memory usage by 30%-70%. However, the filter needs the ability to associate values with κ-mers in order to perform the analysis in a single communication pass. Bloom filters do not support value associations and cause the application to perform an extra communication pass, thereby increasing the run time. Therefore, MetaHipMer faces a trade off between memory and speed due to the limited capabilities of traditional filters.In this paper, we overcome the memory and speed trade off in MetaHipMer by integrating a GPU-based feature-rich filter, the Two-Choice filter (TCF), in the MetaHipMer pipeline. The TCF uses key-value association to approximately store κ-mers with extensions. This allows MetaHipMer to perform κ-mer analysis on the GPUs in a single communication pass. Our empirical analysis shows a 50% reduction in memory usage in κ-mer analysis on each node in MetaHipMer without any effect on the overall run time or assembly quality. The memory reduction in turn results in a 43% reduction in the number of nodes required to assemble datasets and enables MetaHipMer to scale to much larger datasets.
更多
查看译文
关键词
memory/speed,exascale,trade-off
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要