SeLaB: Semantic Labeling with BERT

2021 International Joint Conference on Neural Networks (IJCNN)(2021)

引用 10|浏览12
暂无评分
摘要
Generating schema labels automatically for column values of data tables has many data science applications such as schema matching, and data discovery and linking. For example, automatically extracted tables with missing headers can be filled by the predicted schema labels which significantly reduces human effort. Furthermore, the predicted labels can reduce the impact of inconsistent names across multiple data tables. In this paper, we propose a context-aware semantic labeling method using both data values and contextual information of columns. Our proposed method is based on formulating the semantic labeling task as a structured prediction problem, where we sequentially predict labels for an input table with missing headers. We incorporate both the values and context of each data column using the pre-trained contextualized language model, BERT. To our knowledge, we are the first to successfully adapt BERT to solve the semantic labeling task. We evaluate our approach using two real-world datasets from different domains, and we demonstrate substantial improvements in terms of evaluation metrics over state-of-the-art feature-based methods.
更多
查看译文
关键词
semantic labeling, pretrained language model, data table
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要