ReMatch: Retrieval Enhanced Schema Matching with LLMs
CoRR(2024)
摘要
Schema matching is a crucial task in data integration, involving the
alignment of a source database schema with a target schema to establish
correspondence between their elements. This task is challenging due to textual
and semantic heterogeneity, as well as differences in schema sizes. Although
machine-learning-based solutions have been explored in numerous studies, they
often suffer from low accuracy, require manual mapping of the schemas for model
training, or need access to source schema data which might be unavailable due
to privacy concerns. In this paper we present a novel method, named ReMatch,
for matching schemas using retrieval-enhanced Large Language Models (LLMs). Our
method avoids the need for predefined mapping, any model training, or access to
data in the source database. In the ReMatch method the tables of the target
schema and the attributes of the source schema are first represented as
structured passage-based documents. For each source attribute document, we
retrieve J documents, representing target schema tables, according to their
semantic relevance. Subsequently, we create a prompt for every source table,
comprising all its attributes and their descriptions, alongside all attributes
from the set of top J target tables retrieved previously. We employ LLMs
using this prompt for the matching task, yielding a ranked list of K
potential matches for each source attribute. Our experimental results on large
real-world schemas demonstrate that ReMatch significantly improves matching
capabilities and outperforms other machine learning approaches. By eliminating
the requirement for training data, ReMatch becomes a viable solution for
real-world scenarios.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要