ClimSim: A large multi-scale dataset for hybrid physics-ML climate emulation
arXiv (Cornell University)(2023)
摘要
Modern climate projections lack adequate spatial and temporal resolution due
to computational constraints. A consequence is inaccurate and imprecise
predictions of critical processes such as storms. Hybrid methods that combine
physics with machine learning (ML) have introduced a new generation of higher
fidelity climate simulators that can sidestep Moore's Law by outsourcing
compute-hungry, short, high-resolution simulations to ML emulators. However,
this hybrid ML-physics simulation approach requires domain-specific treatment
and has been inaccessible to ML experts because of lack of training data and
relevant, easy-to-use workflows. We present ClimSim, the largest-ever dataset
designed for hybrid ML-physics research. It comprises multi-scale climate
simulations, developed by a consortium of climate scientists and ML
researchers. It consists of 5.7 billion pairs of multivariate input and output
vectors that isolate the influence of locally-nested, high-resolution,
high-fidelity physics on a host climate simulator's macro-scale physical state.
The dataset is global in coverage, spans multiple years at high sampling
frequency, and is designed such that resulting emulators are compatible with
downstream coupling into operational climate simulators. We implement a range
of deterministic and stochastic regression baselines to highlight the ML
challenges and their scoring. The data
(https://huggingface.co/datasets/LEAP/ClimSim_high-res) and code
(https://leap-stc.github.io/ClimSim) are released openly to support the
development of hybrid ML-physics and high-fidelity climate simulations for the
benefit of science and society.
更多查看译文
关键词
physics emulators,climate,large-scale large-scale,high-resolution,multi-scale
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要