AquaSee: Predict Load and Cooling System Faults of Supercomputers Using Chilled Water Data

Journal of Computer Science and Technology(2020)

引用 1|浏览18
暂无评分
摘要
An analysis of real-world operational data of Tianhe-1A (TH-1A) supercomputer system shows that chilled water data not only can reflect the status of a chiller system but also are related to supercomputer load. This study proposes AquaSee, a method that can predict the load and cooling system faults of supercomputers by using chilled water pressure and temperature data. This method is validated on the basis of real-world operational data of the TH-1A supercomputer system at the National Supercomputer Center in Tianjin. Datasets with various compositions are used to construct the prediction model, which is also established using different prediction sequence lengths. Experimental results show that the method that uses a combination of pressure and temperature data performs more effectively than that only consisting of either pressure or temperature data. The best inference sequence length is two points. Furthermore, an anomaly monitoring system is set up by using chilled water data to help engineers detect chiller system anomalies.
更多
查看译文
关键词
supercomputer,chilled water data,sensor network,load prediction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要