基本信息
浏览量:7
职业迁徙
个人简介
He has worked in a series of interdisciplinary research areas, including applied mathematics, computational and symbolic algebra, numerical analysis, computing in high energy physics, bioinformatics, high-performance computing, and computer systems. He has graduated 10 PhD students and has co-authored over 125 refereed publications.
The ongoing DMTCP project (Distributed MultiThreaded Checkpointing) supports transparent checkpointing (snapshots) with no modification to the target application binary. DMTCP extends transparent checkpoint support to external hardware/software environments like GPUs and network interconnects to support MPI for HPC. Over 150 refereed publications document examples of using DMTCP.
The newest direction for DMTCP is to make it a standard for supercomputing and HPC. In collaboration with the DOE’s NERSC supercomputing center, the DMTCP project (including MANA for MPI and CRAC for CUDA) is being extended and validated for production use. This will be used on NERSC’s Perlmutter supercomputer (expected to become the #6 supercomputer in the world when fully installed). The functionality provided by DMTCP, MANA, and CRAC will enable scientists to execute long-running computations by using checkpoint-restart to chain together multiple allocation time slots. Currently, users are limited to a maximum allocation time slot of 48 hours. This showcase project will allow other HPC centers to also use this new technology.
The ongoing DMTCP project (Distributed MultiThreaded Checkpointing) supports transparent checkpointing (snapshots) with no modification to the target application binary. DMTCP extends transparent checkpoint support to external hardware/software environments like GPUs and network interconnects to support MPI for HPC. Over 150 refereed publications document examples of using DMTCP.
The newest direction for DMTCP is to make it a standard for supercomputing and HPC. In collaboration with the DOE’s NERSC supercomputing center, the DMTCP project (including MANA for MPI and CRAC for CUDA) is being extended and validated for production use. This will be used on NERSC’s Perlmutter supercomputer (expected to become the #6 supercomputer in the world when fully installed). The functionality provided by DMTCP, MANA, and CRAC will enable scientists to execute long-running computations by using checkpoint-restart to chain together multiple allocation time slots. Currently, users are limited to a maximum allocation time slot of 48 hours. This showcase project will allow other HPC centers to also use this new technology.
研究兴趣
论文共 33 篇作者统计合作学者相似作者
按年份排序按引用量排序主题筛选期刊级别筛选合作者筛选合作机构筛选
时间
引用量
主题
期刊级别
合作者
合作机构
Prashant Singh Chouhan,Harsh Khetawat, Neil Resnik,Twinkle Jain,Rohan Garg,Gene Cooperman,Rebecca Hartman-Baker,Zhengji Zhao
arxiv(2021)
引用0浏览0EI引用
0
0
arxiv(2021)
引用0浏览0EI引用
0
0
CoRRno. 3 (2020): 13
HPDC'19: PROCEEDINGS OF THE 28TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING (2019): 49-60
Temporal meta-programming: treating time as a spatial dimension (2012)
引用23浏览0EI引用
23
0
加载更多
作者统计
合作学者
合作机构
D-Core
- 合作者
- 学生
- 导师
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn