SciDP: Support HPC and Big Data Applications via Integrated Scientific Data Processing

2018 IEEE International Conference on Cluster Computing (CLUSTER)(2018)

引用 8|浏览43
暂无评分
摘要
Modern High Performance Computing (HPC) applications, such as Earth science simulations, produce large amounts of data due to the surging of computing power, while big data applications have become more compute-intensive due to increasingly sophisticated analysis algorithms. The needs of both HPC and big data technologies for advanced HPC and big data applications create a demand for integrated system support. In this study, we introduce Scientific Data Processing (SciDP) to support both HPC and big data applications via integrated scientific data processing. SciDP can directly process scientific data stored on a Parallel File System (PFS), which is typically deployed in an HPC environment, in a big data programming environment running atop Hadoop Distributed File System (HDFS). SciDP seamlessly integrates PFS, HDFS, and the widely-used R data analysis system to support highly efficient processing of scientific data. It utilizes the merits of both PFS and HDFS for fast data transfer, overlaps computing with data accessing, and integrates R into the data transfer process. Experimental results show that SciDP accelerates analysis and visualization of a production NASA Center for Climate Simulation (NCCS) climate and weather application by 6x to 8x when compared to existing solutions.
更多
查看译文
关键词
HPC,Big data,HPDA,Hadoop,R language
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要