Accelerating Black-Box Molecular Property Optimization by Adaptively Learning Sparse Subspaces
CoRR(2024)
摘要
Molecular property optimization (MPO) problems are inherently challenging
since they are formulated over discrete, unstructured spaces and the labeling
process involves expensive simulations or experiments, which fundamentally
limits the amount of available data. Bayesian optimization (BO) is a powerful
and popular framework for efficient optimization of noisy, black-box objective
functions (e.g., measured property values), thus is a potentially attractive
framework for MPO. To apply BO to MPO problems, one must select a structured
molecular representation that enables construction of a probabilistic surrogate
model. Many molecular representations have been developed, however, they are
all high-dimensional, which introduces important challenges in the BO process
– mainly because the curse of dimensionality makes it difficult to define and
perform inference over a suitable class of surrogate models. This challenge has
been recently addressed by learning a lower-dimensional encoding of a SMILE or
graph representation of a molecule in an unsupervised manner and then
performing BO in the encoded space. In this work, we show that such methods
have a tendency to "get stuck," which we hypothesize occurs since the mapping
from the encoded space to property values is not necessarily well-modeled by a
Gaussian process. We argue for an alternative approach that combines numerical
molecular descriptors with a sparse axis-aligned Gaussian process model, which
is capable of rapidly identifying sparse subspaces that are most relevant to
modeling the unknown property function. We demonstrate that our proposed method
substantially outperforms existing MPO methods on a variety of benchmark and
real-world problems. Specifically, we show that our method can routinely find
near-optimal molecules out of a set of more than >100k alternatives within
100 or fewer expensive queries.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要