SciDFS: An In-Situ Processing System for Scientific Array Data Based on Distributed File System

2018 IEEE International Conference on Big Data and Smart Computing (BigComp)(2018)

引用 0|浏览12
暂无评分
摘要
Recently, the amount of array data generated by scientific observation instruments increases rapidly. The array data is usually stored in standard formats such as HDF5 and NetCDF. To support high-level queries on the array data, a number of array DBMSs such as SciDB have been proposed. However, they typically have two drawbacks: slow data loading and not directly supporting standard formats. In particular, slow data loading is fatal since the speed of scientific data generation might be faster than that of data loading. To solve those drawbacks, we propose a distributed in-situ processing system called SciDFS that exploits a distributed file system (DFS) for storing and managing array data. SciDFS is a hybrid system that tightly integrates the query processing layer of an array DBMS with a DFS via an in-situ layer. It stores raw array data as DFS blocks very fast and processes queries in an in-situ manner by accessing the relevant DFS blocks. Through experiments using NASA's real satellite array data, we have shown three major features of SciDFS: high performance data loading (50X faster than SciDB), fast in-situ query processing performance, and running legacy applications for the HDF5 format.
更多
查看译文
关键词
in situ-processing,distributed file system,array DBMS,big data analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要