Substrate Scope Contrastive Learning: Repurposing Human Bias to Learn Atomic Representations
CoRR(2024)
摘要
Learning molecular representation is a critical step in molecular machine
learning that significantly influences modeling success, particularly in
data-scarce situations. The concept of broadly pre-training neural networks has
advanced fields such as computer vision, natural language processing, and
protein engineering. However, similar approaches for small organic molecules
have not achieved comparable success. In this work, we introduce a novel
pre-training strategy, substrate scope contrastive learning, which learns
atomic representations tailored to chemical reactivity. This method considers
the grouping of substrates and their yields in published substrate scope tables
as a measure of their similarity or dissimilarity in terms of chemical
reactivity. We focus on 20,798 aryl halides in the CAS Content Collection
spanning thousands of publications to learn a representation of aryl halide
reactivity. We validate our pre-training approach through both intuitive
visualizations and comparisons to traditional reactivity descriptors and
physical organic chemistry principles. The versatility of these embeddings is
further evidenced in their application to yield prediction, regioselectivity
prediction, and the diverse selection of new substrates. This work not only
presents a chemistry-tailored neural network pre-training strategy to learn
reactivity-aligned atomic representations, but also marks a first-of-its-kind
approach to benefit from the human bias in substrate scope design.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要