Building A Machine Learning Model To Predict Sample Pesticide Content Utilizing Thermal Desorption MION-CIMS Analysis

Federica Bortolussi, Hilda Sandström, Fariba Partovi, Joona Mikkilä,Patrick Rinke, Matti Rissanen

crossref(2024)

引用 0|浏览2
暂无评分
摘要
Pests significantly impact crop yields, leading to food insecurity. Pesticides are substances, or a mixture of substances, made to eliminate or control pests, or to regulate the growth of crops.Currently, more than 1000 pesticides are available in the market. However, their long-lasting environmental impact necessitates strict regulation, especially regarding their presence in food (FAO, 2022). Pesticides play also a role in the atmosphere as their volatilization can produce oxidized products through photolysis or OH reactions and they can be transported over large distances.The fundamental properties and behaviours of these compounds are still not well understood. Because of their complex structure, even low DFT level computations can be extremely expensive. This project applies machine learning (ML) tools to chemical ionization mass spectra to ultimately develop a technique capable of predicting spectra’s peak intensities and the chemical ionization mass spectrometry (CIMS) sensitivity to pesticides. The primary challenge is to develop a ML model that comprehensively explains ion-molecule interactions while minimizing computational costs. Our data set comprises different standard mixtures containing, in total, 716 pesticides measured with an orbitrap atmospheric pressure CIMS, with a multi-scheme chemical ionization inlet (MION) and five different concentrations (Rissanen et al, 2019; Partovi et al, 2023). The reagents of the ionization methods are CH2Br2, H2O, O2 and (CH3)2CO, generating respectively Br- , H3O+, O2- and [(CH3)2 CO + H]+ ions. The project follows a general ML workflow: after an exploratory analysis, the data are preprocessed and fed to the ML algorithm, which classifies the ionization method able to detect the molecule, and, therefore, predicts the peak intensity of each pesticide; the accuracy of the prediction can be retrieved after measuring the performance of the model.A random forest classifier was chosen to perform the classification of the ionization methods, to predict which one was able to detect each pesticide. The regression was performed with a kernel ridge regressor. Each algorithm was run with different types of molecular descriptors (topological fingerprint, MACCS keys and many-body tensor representation), to test which one was able to represent the molecular structure most accurately. The results of the exploratory analysis highlight different trends between the positive and negative ionization methods, suggesting that different ion-molecule mechanisms are involved (Figure 1). The classification reaches around 80% accuracy for each ionization method with all four molecular descriptors tested, while the regression can predict fairly well the logarithm of the intensities of each ionization method, reaching 0.5 of error with MACCS keys for (CH3)2CO reagent (Figure 2). Figure 1: Distribution of pesticide peak intensities for each reagent ion at five different concentrations. Figure 2: Comparison of the KRR performance on (CH3)2CO reagent data with four different molecular descriptors.    
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要