Upscaling Soil Organic Carbon Measurements at the Continental Scale Using Multivariate Clustering Analysis and Machine Learning

JOURNAL OF GEOPHYSICAL RESEARCH-BIOGEOSCIENCES(2024)

引用 0|浏览0
暂无评分
摘要
Estimates of soil organic carbon (SOC) stocks are essential for many environmental applications. However, significant inconsistencies exist in SOC stock estimates for the U.S. across current SOC maps. We propose a framework that combines unsupervised multivariate geographic clustering (MGC) and supervised Random Forests regression, improving SOC maps by capturing heterogeneous relationships with SOC drivers. We first used MGC to divide the U.S. into 20 SOC regions based on the similarity of covariates (soil biogeochemical, bioclimatic, biological, and physiographic variables). Subsequently, separate Random Forests models were trained for each SOC region, utilizing environmental covariates and SOC observations. Our estimated SOC stocks for the U.S. (52.6 +/- 3.2 Pg for 0-30 cm and 108.3 +/- 8.2 Pg for 0-100 cm depth) were within the range estimated by existing products like Harmonized World Soil Database, HWSD (46.7 Pg for 0-30 cm and 90.7 Pg for 0-100 cm depth) and SoilGrids 2.0 (45.7 Pg for 0-30 cm and 133.0 Pg for 0-100 cm depth). However, independent validation with soil profile data from the National Ecological Observatory Network showed that our approach (R2 = 0.51) outperformed the estimates obtained from Harmonized World Soil Database (R2 = 0.23) and SoilGrids 2.0 (R2 = 0.39) for the topsoil (0-30 cm). Uncertainty analysis (e.g., low representativeness and high coefficients of variation) identified regions requiring more measurements, such as Alaska and the deserts of the U.S. Southwest. Our approach effectively captures the heterogeneous relationships between widely available predictors and the current SOC baseline across regions, offering reliable SOC estimates at 1 km resolution for benchmarking Earth system models. Soils represent the largest terrestrial carbon (C) pool. To understand how soil C will change under a changing climate, we first need to have an accurate estimate of how much soil organic carbon (SOC) is present. However, SOC maps for the U.S. are highly variable. In this study, we developed a new framework for estimating SOC stocks across the entire U.S. using data from site-level measurements. We first divided the U.S. into 20 regions based on environmental conditions and then created machine-learning models for each region to make an accurate, continuous map. Our method was able to capture different relationships between environmental variables and SOC across regions and improved the overall estimates for the amount of SOC found in U.S. soils. While in all regions, climate was an important variable for predicting SOC, soil properties, plant inputs, and elevation played key roles in some regions. As part of this effort, we identified areas with high uncertainty, which could be target regions for additional measurements in the future (e.g., Alaska and the deserts of the Southwest U.S.). Our method provides new insights for the soil mapping community and yields robust SOC estimates that can inform the terrestrial C cycle in models. We estimated soil organic carbon (SOC) stocks at the continental scale by coupling multivariate geographic clustering with machine learning Independent validation showed our approach (R2 = 0.51) outperformed Harmonized World Soil Database (R2 = 0.23) and SoilGrids 2.0 (R2 = 0.39) for the topsoil Low representativeness and high uncertainty regions can inform future sampling efforts to further improve SOC estimation
更多
查看译文
关键词
soil organic carbon stock,gridded SOC data,multivariate geographic clustering,representativeness analysis,random forests,uncertainty
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要