Scientific Language Modeling: A Quantitative Review of Large Language Models in Molecular Science
CoRR(2024)
摘要
Efficient molecular modeling and design are crucial for the discovery and
exploration of novel molecules, and the incorporation of deep learning methods
has revolutionized this field. In particular, large language models (LLMs)
offer a fresh approach to tackle scientific problems from a natural language
processing (NLP) perspective, introducing a research paradigm called scientific
language modeling (SLM). However, two key issues remain: how to quantify the
match between model and data modalities and how to identify the
knowledge-learning preferences of models. To address these challenges, we
propose a multi-modal benchmark, named ChEBI-20-MM, and perform 1263
experiments to assess the model's compatibility with data modalities and
knowledge acquisition. Through the modal transition probability matrix, we
provide insights into the most suitable modalities for tasks. Furthermore, we
introduce a statistically interpretable approach to discover context-specific
knowledge mapping by localized feature filtering. Our pioneering analysis
offers an exploration of the learning mechanism and paves the way for advancing
SLM in molecular science.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要