Continuous learning of HPC infrastructure models using big data analytics and in-memory processing tools.

DATE(2017)

引用 69|浏览78
暂无评分
摘要
Exascale computing represents the next leap in the HPC race. Reaching this level of performance is subject to several engineering challenges such as energy consumption, equipment-cooling, reliability and massive parallelism. Model-based optimization is an essential tool in the design process and control of energy efficient, reliable and thermally constrained systems. However, in the Exascale domain, model learning techniques tailored to the specific supercomputer require real measurements and must therefore handle and analyze a massive amount of data coming from the HPC monitoring infrastructure. This becomes rapidly a \"big data\" scale problem. The common approach where measurements are first stored in large databases and then processed is no more affordable due to the increasingly storage costs and lack of real-time support. Nowadays instead, cloud-based machine learning techniques aim to build on-line models using real-time approaches such as \"stream processing\" and \"in-memory\" computing, that avoid storage costs and enable fast-data processing. Moreover, the fast delivery and adaptation of the models to the quick data variations, make the decision stage of the optimization loop more effective and reliable. In this paper we leverage scalable, lightweight and flexible IoT technologies, such as the MQTT protocol, to build a highly scalable HPC monitoring infrastructure able to handle the massive sensor data produced by next-gen HPC components. We then show how state-of-the art tools for big data computing and analysis, such as Apache Spark, can be used to manage the huge amount of data delivered by the monitoring layer and to build adaptive models in real-time using on-line machine learning techniques.
更多
查看译文
关键词
continuous learning,HPC infrastructure models,big data analytics,in-memory processing tools,exascale computing,HPC race,model-based optimization,design process,energy efficient system,thermally constrained systems,exascale domain,model learning techniques,supercomputer,HPC monitoring infrastructure,big data scale problem,large databases,cloud-based machine learning techniques,real-time approaches,optimization loop decision stage,IoT technologies,next-gen HPC components,big data computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要