Discovery of the Hidden World with Large Language Models
CoRR(2024)
摘要
Science originates with discovering new causal knowledge from a combination
of known facts and observations. Traditional causal discovery approaches mainly
rely on high-quality measured variables, usually given by human experts, to
find causal relations. However, the causal variables are usually unavailable in
a wide range of real-world applications. The rise of large language models
(LLMs) that are trained to learn rich knowledge from the massive observations
of the world, provides a new opportunity to assist with discovering high-level
hidden variables from the raw observational data. Therefore, we introduce COAT:
Causal representatiOn AssistanT. COAT incorporates LLMs as a factor proposer
that extracts the potential causal factors from unstructured data. Moreover,
LLMs can also be instructed to provide additional information used to collect
data values (e.g., annotation criteria) and to further parse the raw
unstructured data into structured data. The annotated data will be fed to a
causal learning module (e.g., the FCI algorithm) that provides both rigorous
explanations of the data, as well as useful feedback to further improve the
extraction of causal factors by LLMs. We verify the effectiveness of COAT in
uncovering the underlying causal system with two case studies of review rating
analysis and neuropathic diagnosis.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要