The Drug-Like Molecule Pre-Training Strategy for Drug Discovery.

IEEE Access(2023)

引用 0|浏览3
暂无评分
摘要
Recent advances in artificial intelligence (AI) have led to the development of transformer-based models that have shown success in identifying potential drug molecules for therapeutic purposes. However, for a molecule to be considered a viable drug candidate, it must exhibit certain desirable properties such as low toxicity, high druggability, and synthesizability. To address this, we propose an approach that incorporates prior knowledge about these properties during the model training process. In this study, we utilized the PubChem database, which contains 100 million molecules, to filter drug-like molecules based on the quantity of drug-likeliness (QED) score and the Pfizer rule. We then used this filtered dataset of drug-like molecules to train both molecular representation (ChemBERTa) and molecular generation models (MolGPT). To assess the performance of the molecular representation model, we fine-tuned the results on the MoleculeNet benchmark datasets. Meanwhile, we evaluated the performance of the molecular generation model based on the generated samples comprising 10,000 molecules. Despite the limited diversity of the pre-training dataset, the models for molecular representation were able to retain at least 90% of their original performance on benchmark datasets, with an additional improvement of 6% in predicting clinical toxicology. In the domain of molecular generation, the model pre-trained on drug-like molecules exhibited a high rate of desirable molecule properties in the unconditionally generated outputs. Additionally, the diversity of generated structures demonstrated notable performance compared to the conditional generation approach. Moreover, the drug-like molecule pre-training strategy is not limited to a specific model or training method, making it a flexible approach that can be easily modified based on the research interests and criteria of interest.
更多
查看译文
关键词
molecule,discovery,drug-like,pre-training
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要