Workflow Submit Nodes as a Service on Leadership Class Systems

PEARC '20: Practice and Experience in Advanced Research Computing Portland OR USA July, 2020(2020)

引用 3|浏览21
暂无评分
摘要
DOE scientists, today, have access to high performance computing (HPC) facilities with very powerful systems that enable them to execute their computations faster, more efficiently, and at greater scales than ever before. To further their knowledge and produce new discoveries, scientists rely on workflows - sometimes very complex - that provide them with an easy way to automate, reproduce and verify their computations. However, historically, creating workflow submission environments in large HPC facilities has been cumbersome, requires expertise and many man-hours of effort due to the peculiarities, policies, and the restrictions that these systems present. In this paper we discuss the approach a large DOE facility (OLCF) is taking in order to provide containers as a service to its users. This capability is used to create Pegasus workflow management system submit nodes as a service (WSaaS) at the Oak Ridge Leadership Computing Facilities (OLCF), targeting the Summit supercomputer. This deployment builds upon the Kubernetes/Openshift cluster (Slate) that exists within OLCF’s DMZ and its automation triggers. Additionally, we evaluate our approach’s overhead and effort to deploy the solution as compared to previous solutions, such as setting up a Pegasus submission environment on OLCF’s login nodes or submitting jobs remotely via the rvGAHP.
更多
查看译文
关键词
leadership class systems,nodes,service
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要