Exploring New Monitoring and Analysis Capabilities on Cray’s Software Preview System

Jim Brandt, Connor Brown,Scott Donoho,Ann Gentile, Joe Greenseid,William Kramer, Patti Langer, Aamir Rashid,Kevan Rehm,Michael Showerman

semanticscholar(2019)

引用 0|浏览2
暂无评分
摘要
Cray, NCSA, and Sandia staff and engineers are collaborating to jointly investigate and provide new insights on the monitoring aspects of Cray’s recently released “Software Preview System.” In the preview system, Cray has implemented the LDMS framework within the monitoring infrastructure. In this work, we extend that implementation and leverage the Cray infrastructure to include new monitoring capabilities suitable for addtional node-level and application monitoring. We use the Cray-provided telemetry bus for transport and consumption of the new metric data. We explore scale and performance considerations. We provide details on the issues impacting or facilitating implementation of these functionalities within Cray’s new, container-based services system. In our implementation, we adhere to Cray’s design philosophy which is intended to ensure the reliability and availability of the Craycollected metrics and system services.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要