De Novo Molecular Structure Generation from Mass Spectra.

2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)(2023)

引用 0|浏览2
暂无评分
摘要
Mass spectrometry is a key technology for the identification of small molecules. However, traditional methods that rely on database comparisons have difficulty with newly discovered molecules that are not in the database. Recent advances in deep learning allow for direct analysis of mass spectra, which makes it possible to predict chemical structures without using a database. We have found that the accurate prediction of hydrogen atoms is a major challenge for the prediction of chemical structures, especially since they are not explicitly represented in SMILES. To address this challenge, we introduce MS2SMILES, a novel approach that treats hydrogen atoms as implicitly linked to heavy atoms. This method enables the model to predict both heavy atoms and hydrogen atoms accurately (instead of just focusing on heavy atoms) during the training phase. Additionally, MS2SMILES incorporates the SMILES grammatical rules when predicting chemical structures, increasing the reliability of the generated SMILES representations. We tested MS2SMILES using the GNPS and CASMI 2016 datasets, and it achieved SMILES prediction accuracies of 53.6% and 63.8%, respectively. These results demonstrate a significant improvement of 19.9% and 10.9% compared to the current leading method.
更多
查看译文
关键词
mass spectrometry,molecule identification,chemical structure prediction,SMILES,deep learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要