Achieving scalability in parallel file systems

Achieving scalability in parallel file systems（2005）

引用 13|浏览1

暂无评分

摘要

Parallel computing has become an essential tool for scientific computation. However, several supporting technologies beyond just raw processing speed are necessary in order to achieve balanced application efficiency in this domain. Parallel file systems in particular are an example of a supporting technology that has proven successful in achieving the I/O bandwidth demanded by parallel applications. However, the need for high performance continues to grow, prompting efforts to scale parallel computers to ever larger sizes in order to meet the computational demand. The current generation of large scale systems utilize thousands of dedicated processing nodes while even larger systems are planned for the near future. Conventional file system design assumptions are not sufficient for this class of parallel systems. We must therefore revisit parallel file system design techniques in order to achieve the scalability necessary for the next generation of parallel computers. We have identified five key obstacles that limit the ability of parallel file systems to scale to systems with thousands of processors: efficiency, complexity, management, consistency, and fault tolerance. In order to address these obstacles we present the techniques of intelligent servers and collective communication for parallel I/O. These techniques are used to offload work from client processes, optimize high level file system operations, and limit the overhead of network communication in order to provide a comprehensive framework for building scalable file systems. These techniques not only improve file system scalability, but also help to broaden the applicability of parallel file systems to problem domains beyond scientific computing. Intelligent servers are an original concept in which servers transparently take control of optimization decisions and communicate with each other in order to service individual operations. Collective communication is a well known optimization in the fields of message passing and distributed shared memory which we have applied in a novel manner to the parallel file server environment. In this work we present the Parallel Virtual File System 2 (PVFS2), along with several key extensions, as an experimental platform for this study. We then develop an analytical modeling framework for comparing a variety of file system algorithms in order to predict file system performance at scale and compare potential optimizations. These models are verified against a real world implementation with hundreds of processors and multiple network environments. Next we evaluate the implementation of intelligent servers and collective communication in PVFS2 with regard to the five previously listed obstacles to scalability. We show that throughput for meta-data operations can be doubled for moderately sized systems and project an order of magnitude improvement for systems with thousands of servers. We simultaneously reduce client code complexity and decrease CPU overhead by 90%. We show that management is improved through intelligent server load balancing and performance monitoring. We also evaluate consistency improvements with case study analysis and demonstrate improved fault tolerance when compared to conventional design alternatives. This study concludes with a summary of how the research goals have been met and how previously intractable avenues of future work have been enabled.

查看译文

关键词

conventional file system design,parallel computer,parallel file system,file system scalability,parallel application,intelligent server,collective communication,file system performance,achieving scalability,optimize high level file,file system algorithm

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要