Do Chemformers Dream of Organic Matter? Evaluating a Transformer Model for Multistep Retrosynthesis

Annie M. Westerlund, Siva Manohar Koki, Supriya Kancharla, Alessandro Tibo, Lakshidaa Saigiridharan, Mikhail Kabeshov, Rocio Mercado,Samuel Genheden

JOURNAL OF CHEMICAL INFORMATION AND MODELING(2024)

引用 0|浏览0
暂无评分
摘要
Synthesis planning of new pharmaceutical compounds is a well-known bottleneck in modern drug design. Template-free methods, such as transformers, have recently been proposed as an alternative to template-based methods for single-step retrosynthetic predictions. Here, we trained and evaluated a transformer model, called the Chemformer, for retrosynthesis predictions within drug discovery. The proprietary data set used for training comprised similar to 18 M reactions from literature, patents, and electronic lab notebooks. Chemformer was evaluated for the purpose of both single-step and multistep retrosynthesis. We found that the single-step performance of Chemformer was especially good on reaction classes common in drug discovery, with most reaction classes showing a top-10 round-trip accuracy above 0.97. Moreover, Chemformer reached a higher round-trip accuracy compared to that of a template-based model. By analyzing multistep retrosynthesis experiments, we observed that Chemformer found synthetic routes, leading to commercial starting materials for 95% of the target compounds, an increase of more than 20% compared to the template-based model on a proprietary compound data set. In addition to this, we discovered that Chemformer suggested novel disconnections corresponding to reaction templates, which are not included in the template-based model. These findings were further supported by a publicly available ChEMBL compound data set. The conclusions drawn from this work allow for the design of a synthesis planning tool where template-based and template-free models work in harmony to optimize retrosynthetic recommendations.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要