System Management in the BlueGene/L Supercomputer

IPDPS(2003)

引用 23|浏览115
暂无评分
摘要
With 65,536 compute nodes, the BlueGene/L supercomputer represents a new level of scalability for parallel systems. In this paper, we discuss system management and control for BlueGene/L, including machine booting, software installation, user account management, system monitoring, and job execution. We address the issue of scalability by organizing the system hierarchically. The 65,536 compute nodes are organized in 1,024 clusters of 64 compute nodes each, called processing sets. Each processing set is under control of a 65 th node, called an I/O node. The 1,024 processing sets can then be managed to a great extent as a regular Linux cluster. Regular cluster management is complemented by BlueGene/L specific services, performed by a service node over a separate control network.
更多
查看译文
关键词
system on a chip,software installation,scalability,system management,linux cluster,linux,control systems,system monitoring,management system,control network,parallel systems,system on chip,computer architecture,concurrent computing,software development
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要