From Words to Molecules: A Survey of Large Language Models in Chemistry
CoRR(2024)
摘要
In recent years, Large Language Models (LLMs) have achieved significant
success in natural language processing (NLP) and various interdisciplinary
areas. However, applying LLMs to chemistry is a complex task that requires
specialized domain knowledge. This paper provides a thorough exploration of the
nuanced methodologies employed in integrating LLMs into the field of chemistry,
delving into the complexities and innovations at this interdisciplinary
juncture. Specifically, our analysis begins with examining how molecular
information is fed into LLMs through various representation and tokenization
methods. We then categorize chemical LLMs into three distinct groups based on
the domain and modality of their input data, and discuss approaches for
integrating these inputs for LLMs. Furthermore, this paper delves into the
pretraining objectives with adaptations to chemical LLMs. After that, we
explore the diverse applications of LLMs in chemistry, including novel
paradigms for their application in chemistry tasks. Finally, we identify
promising research directions, including further integration with chemical
knowledge, advancements in continual learning, and improvements in model
interpretability, paving the way for groundbreaking developments in the field.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要