Long-term hourly air quality data bridging of neighboring sites using automated machine learning: A case study in the Greater Bay area of China

ATMOSPHERIC ENVIRONMENT(2024)

引用 0|浏览0
暂无评分
摘要
Long-term air pollution data are essential for formulating air quality management policies and assessing their corresponding impacts on public health. However, missing data are inevitably encountered during air pollution observations at different sites. This study proposed a machine learning approach that utilizes data from neighboring sites to reconstruct missing data. Hourly observation data from three neighboring sites in the Pearl River Delta (PRD) region in South China, were used for data retrieval, including the NC site (2006-2015), JXL site and PYZX site (2014-2022). The overlapped data (2014.05-2015.12) were used to train and evaluate the machine learning models. The performance of 11 algorithms (CatBoost, XGBoost, LightGBM, LightGBMXT, LightGBMLarge, RandomForestMSE, ExtraTreeMSE, NeuralNetTorch, NeuralNetFastAI, KNeighborsDist, and KNeighborsUnif) for the retrieval of major air pollutants, including O3, NO2, PM2.5, PM10 and SO2 was benchmarked by a set of evaluation metrics. CatBoost showed the best performance; thus, it was adopted for air pollutant data reconstruction in NC (2016-2022) and PYZX (2008-2014). Long-term data (2006-2022) at the NC were obtained by combining the observation and retrieval data. In the past 15 years, the O3 concentration of NC has increased by 72% at a rate of 0.83 ppb yr- 1 (3.2% yr-1). On the contrary, substantial reductions were observed for NO2 (61%), PM2.5 (51%) and PM10 (42%) at the NC site, with the rates of -1.27 ppb yr- 1 (-5.9% yr-1), -1.96 mu g m- 3 yr- 1 (-5.8% yr-1) and -2.32 mu gm- 3 yr- 1 (-5.2% yr-1), respectively. SO2 exhibits the most pronounced reduction (79%) among all species, with two distinct rates of -4.10 ppb yr- 1 (-27.4% yr-1) and -0.40 ppb yr- 1 (-6.2% yr-1), for 2008-2012 and 2012-2022, respectively. This study demonstrates the feasibility of machine learning in filling the data gap of air pollution monitoring network and highlights the importance of continuous long-term air pollution data in reviewing air quality management policies.
更多
查看译文
关键词
Long-term trend,Air pollution,Machine learning,Data gap,Monitoring network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要