PIKA: Center-Wide and Job-Aware Cluster Monitoring

2020 IEEE International Conference on Cluster Computing (CLUSTER)(2020)

引用 3|浏览37
暂无评分
摘要
Nowadays, performance optimization is more or less an established procedure in high-performance computing (HPC) centers. To sustainably increase compute efficiency of such systems, we need to increase the awareness of efficiency on both the operator's and the users' side. Therefore, we propose an infrastructure for continuous monitoring and analysis, which automatically characterizes HPC jobs and provides a systematic approach to identify underperforming compute jobs with optimization potential. The recorded metadata and time-series data can be visualized live at runtime or post-mortem and are eventually stored for long-term analysis. The monitoring has a negligible overhead on the compute nodes and neither influences nor limits the user applications.
更多
查看译文
关键词
monitoring,data collection,data visualization,data analysis,collectd,LIKWID
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要