Clarisse: A Middleware For Data-Staging Coordination And Control On Large-Scale Hpc Platforms

CCGRID '16: Proceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing(2016)

引用 24|浏览105
暂无评分
摘要
On current large-scale HPC platforms the data path from compute nodes to final storage passes through several networks interconnecting a distributed hierarchy of nodes serving as compute nodes, I/O nodes, and file system servers. Although applications compete for resources at various system levels, the current system software offers no mechanisms for globally coordinating the data flow for attaining optimal resource usage and for reacting to overload or interference.In this paper we describe CLARISSE, a middleware designed to enhance data-staging coordination and control in the HPC software storage I/O stack. CLARISSE exposes the parallel data flows to a higher-level hierarchy of controllers, thereby opening up the possibility of developing novel cross-layer optimizations, based on the run-time information. To the best of our knowledge, CLARISSE is the first middleware that decouples the policy, control, and data layers of the software I/O stack in order to simplify the task of globally coordinating the data staging on large-scale HPC platforms. To demonstrate how CLARISSE can be used for performance enhancement, we present two case studies: an elastic load-aware collective I/O and a cross-application parallel I/O scheduling policy. The evaluation illustrates how coordination can bring a significant performance benefit with low overheads by adapting to load conditions and interference.
更多
查看译文
关键词
HPC,storage,data staging,parallel I/O,collective I/O,I/O scheduling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要