Systematically inferring I/O performance variability by examining repetitive job behavior

SC(2021)

引用 15|浏览23
暂无评分
摘要
ABSTRACTMonitoring and analyzing I/O behaviors is critical to the efficient utilization of parallel storage systems. Unfortunately, with increasing I/O requirements and resource contention, I/O performance variability is becoming a significant concern. This paper investigates I/O behavior and performance variability on a large-scale high-performance computing (HPC) system using a novel methodology that identifies similarity across jobs from the same application leveraging an I/O characterization tool and then, detects potential I/O performance variability across jobs of the same application. We demonstrate and discuss how our unique methodology can be used to perform temporal and feature analyses to detect interesting I/O performance variability patterns in production HPC systems, and their implications for operating/managing large-scale systems.
更多
查看译文
关键词
Storage systems,HPC,HPC I/O,performance variability,machine learning,distributed computing systems
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要