Exploring effective ways to increase reliable positive samples for machine learning-based urban waterlogging susceptibility assessments.

Xianzhe Tang, Zhanyu Wu,Wei Liu, Juwei Tian,Luo Liu

Journal of environmental management(2023)

引用 1|浏览6
暂无评分
摘要
Machine learning (ML)-based urban waterlogging susceptibility studies suffer from class imbalance, as fewer positive samples are generally available than potential negative samples. Few studies have considered optimizing the results by improving the quality of training samples. To address this issue, we explored effective approaches to reliably increase the numbers of positive samples for such studies. The Synthetic Minority Over-Sampling Technique (SMOTE) and Optimized Seed Spread Algorithm (OSSA), representative of oversampling (synthesizing new samples based on the feature space) and physical (simulating potential inundated area based on the mechanisms of water flow) approaches, respectively, were employed to increase the number of positive samples. Waterlogging in Shenzhen was selected as a case study using eight selected spatial variables. An elaborate experiment was conducted to compare the quality of added samples based on the classifiers' performance and accuracy of waterlogging susceptibility maps (WSMs). The results indicated that (1) the performance of classifiers generated with SMOTE was worse than the original samples, while the use of OSSA improved the trained classifiers, and (2) the accuracy of WSMs was not improved with SMOTE but increased markedly with OSSA. These results may be driven by the diversity of information and features of the added samples. This study indicates the use of SMOTE fails to synthesize reliable samples when applied to waterlogging analysis in Shenzhen, whereas an effective solution for generating reliable positive samples is to use OSSA that simulates the potential submerged regions based on the mechanisms of disaster occurrence and spread.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要