Cost-Efficient Prompt Engineering for Unsupervised Entity Resolution
arxiv(2023)
摘要
Entity Resolution (ER) is the problem of semi-automatically determining when
two entities refer to the same underlying entity, with applications ranging
from healthcare to e-commerce. Traditional ER solutions required considerable
manual expertise, including domain-specific feature engineering, as well as
identification and curation of training data. Recently released large language
models (LLMs) provide an opportunity to make ER more seamless and
domain-independent. However, it is also well known that LLMs can pose risks,
and that the quality of their outputs can depend on how prompts are engineered.
Unfortunately, a systematic experimental study on the effects of different
prompting methods for addressing unsupervised ER, using LLMs like ChatGPT, has
been lacking thus far. This paper aims to address this gap by conducting such a
study. We consider some relatively simple and cost-efficient ER prompt
engineering methods and apply them to ER on two real-world datasets widely used
in the community. We use an extensive set of experimental results to show that
an LLM like GPT3.5 is viable for high-performing unsupervised ER, and
interestingly, that more complicated and detailed (and hence, expensive)
prompting methods do not necessarily outperform simpler approaches. We provide
brief discussions on qualitative and error analysis, including a study of the
inter-consistency of different prompting methods to determine whether they
yield stable outputs. Finally, we consider some limitations of LLMs when
applied to ER.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要