Flexible, Model-Agnostic Method for Materials Data Extraction from Text Using General Purpose Language Models
arxiv(2023)
摘要
Accurate and comprehensive material databases extracted from research papers
are crucial for materials science and engineering, but their development
requires significant human effort. With large language models (LLMs)
transforming the way humans interact with text, LLMs provide an opportunity to
revolutionize data extraction. In this study, we demonstrate a simple and
efficient method for extracting materials data from full-text research papers
leveraging the capabilities of LLMs combined with human supervision. This
approach is particularly suitable for mid-sized databases and requires minimal
to no coding or prior knowledge about the extracted property. It offers high
recall and nearly perfect precision in the resulting database. The method is
easily adaptable to new and superior language models, ensuring continued
utility. We show this by evaluating and comparing its performance on GPT-3 and
GPT-3.5/4 (which underlie ChatGPT), as well as free alternatives such as BART
and DeBERTaV3. We provide a detailed analysis of the method's performance in
extracting sentences containing bulk modulus data, achieving up to 90
precision at 96
further demonstrate the method's broader effectiveness by developing a database
of critical cooling rates for metallic glasses over twice the size of previous
human curated databases.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要