Retrosynthesis prediction enhanced by in-silico reaction data augmentation
CoRR(2024)
摘要
Recent advances in machine learning (ML) have expedited retrosynthesis
research by assisting chemists to design experiments more efficiently. However,
all ML-based methods consume substantial amounts of paired training data (i.e.,
chemical reaction: product-reactant(s) pair), which is costly to obtain.
Moreover, companies view reaction data as a valuable asset and restrict the
accessibility to researchers. These issues prevent the creation of more
powerful retrosynthesis models due to their data-driven nature. As a response,
we exploit easy-to-access unpaired data (i.e., one component of
product-reactant(s) pair) for generating in-silico paired data to facilitate
model training. Specifically, we present RetroWISE, a self-boosting framework
that employs a base model inferred from real paired data to perform in-silico
reaction generation and augmentation using unpaired data, ultimately leading to
a superior model. On three benchmark datasets, RetroWISE achieves the best
overall performance against state-of-the-art models (e.g., +8.6
on the USPTO-50K test dataset). Moreover, it consistently improves the
prediction accuracy of rare transformations. These results show that Retro-
WISE overcomes the training bottleneck by in-silico reactions, thereby paving
the way toward more effective ML-based retrosynthesis models.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要