Towards Acceptance Testing at the Exascale Frontier

Verónica G. Vergara Larrea,Michael J. Brim, Arnold Tharrington,Reuben Budiardja, Wayne Joubert

semanticscholar(2020)

引用 0|浏览5
暂无评分
摘要
At the 2007 Cray User Group meeting, the Oak Ridge Leadership Computing Facility (OLCF) introduced the OLCF Test Harness (OTH), a framework[1] used for acceptance testing of the Jaguar supercomputer[2]. Since then, the OTH framework has evolved to version 2.0 which adds new features and streamlines usability. The OTH is the key piece of software used to orchestrate acceptance testing for all OLCF computational resources before they are deployed for production use, including our leadership class high performance computing (HPC) systems. The OTH framework is written in Python and is publicly available[3]. In this paper, we first describe the requirements, design, and structure of the OTH. Then, we present specific improvements developed to support acceptance testing of the OLCF’s Summit system[4]. We will also showcase new OTH features that have been added to streamline the acceptance test process as well as the motivation behind those changes. As part of this work, we also evaluated different workflow tools in order to determine whether these tools could complement the OTH in two key areas: automation and reporting. The advantages and disadvantages identified with each tool will be discussed. Lastly, we summarize the challenges and lessons learned collected from using the OTH for the acceptance of the last three flagship systems at the OLCF. These may be useful for other HPC centers developing their own testing frameworks or those interested in using the OTH. Keywords-automated testing framework, high performance computing, workflows
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要