A framework for optimizing environmental covariates to support model interpretability in digital soil mapping

Babak Kasraei,Margaret G. Schmidt, Jin Zhang,Chuck E. Bulmer, Deepa S. Filatow, Adrienne Arbor, Travis Pennell,Brandon Heung

Geoderma(2024)

引用 0|浏览0
暂无评分
摘要
A common practice in digital soil mapping (DSM) is to incorporate many environmental covariates into a machine-learning algorithm to predict the spatial patterns of soil attributes. Variance inflation factor (VIF), principal component analysis (PCA), and recursive feature elimination (RFE) are three statistical methods that can be used to reduce the number of covariates. This study aims 1) to compare VIF and PCA approaches; 2) to identify an approach to determine the minimum number of covariates in DSM to ensure model parsimony using RFE after using VIF; and 3) to examine methods to interpret the impact of covariates on the variability of the predicted soil properties. The study area was the province of British Columbia (BC), Canada. This study used legacy data for four soil properties to make digital soil maps: soil organic carbon (SOC%), pH, clay%, and coarse fragment (CF%). Seven models were made for each soil property to determine the influence on validation results by using a different number of covariates produced by various methods on validation results. The results showed that the number of covariates could be reduced from 70 to 4 to 12 with only a little or no difference in concordance correlation coefficient (CCC) validation results. The CCC results of pH models using 70 and 7 covariates were both 0.74, and for other soil properties, this difference was negligible. The validation results obtained from PCA models showed that the performance of PCA in reducing the number of covariates was not as effective as when using VIF. Moreover, this study showed that covariates related to precipitation were the most important for modeling SOC%, soil pH, and clay%. Topographic covariates were the most influential covariates for modeling soil CF%. This study emphasizes the potential benefits of combining various data reduction methods to achieve optimal outcomes and generate the most parsimonious and interpretable models.
更多
查看译文
关键词
Digital Soil Mapping,Variance Inflation Factor,Principal Component Analysis,Recursive Feature Elimination,Quantile Regression
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要