Astronomical Image Processing with Hadoop

Astronomical Society of the Pacific Conference Series(2011)

引用 33|浏览32
暂无评分
摘要
In the coming decade astronomical surveys of the sky will generate tens of terabytes of images and detect hundreds of millions of sources every night. With a requirement that these images be analyzed in real time to identify moving sources such as potentially hazardous asteroids or transient objects such as supernovae, these data streams present many computational challenges. In the commercial world, new techniques that utilize cloud computing have been developed to handle massive data streams. In this paper we describe how cloud computing, and in particular the mapreduce paradigm, can be used in astronomical data processing. We will focus on our experience implementing a scalable image-processing pipeline for the SDSS database using Hadoop (http://hadoop.apache.org/). This multi-terabyte imaging dataset approximates future surveys such as those which will be conducted with the LSST. Our pipeline performs image coaddition in which multiple partially overlapping images are registered, integrated and stitched into a single overarching image. We will first present our initial implementation, then describe several critical optimizations that have enabled us to achieve high performance, and finally describe how we are incorporating a large in-house existing image processing library into our Hadoop system. The optimizations involve prefiltering of the input to remove irrelevant images from consideration, grouping individual FITS files into larger, more efficient indexed files, and a hybrid system in which a relational database is used to determine the input images relevant to the task. The incorporation of an existing image processing library, written in C++, presented difficult challenges since Hadoop is programmed primarily in Java. We will describe how we achieved this integration and the sophisticated image processing routines that were made feasible as a result. We will end by briefly describing the longer term goals of our work, namely detection and classification of transient objects and automated object classification.
更多
查看译文
关键词
real time,indexation,cloud computing,hybrid system,image processing,data processing,relational database
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要