Semi-supervised Instance Matching Using Boosted Classifiers

Extended Semantic Web Conference(2015)

引用 57|浏览48
暂无评分
摘要
Instance matching concerns identifying pairs of instances that refer to the same underlying entity. Current state-of-the-art instance matchers use machine learning methods. Supervised learning systems achieve good performance by training on significant amounts of manually labeled samples. To alleviate the labeling effort, this paper presents a minimally supervised instance matching approach that is able to deliver competitive performance using only 2﾿% training data and little parameter tuning. As a first step, the classifier is trained in an ensemble setting using boosting. Iterative semi-supervised learning is used to improve the performance of the boosted classifier even further, by re-training it on the most confident samples labeled in the current iteration. Empirical evaluations on a suite of six publicly available benchmarks show that the proposed system outcompetes optimization-based minimally supervised approaches in 1---7 iterations. The system's average F-Measure is shown to be within 2.5﾿% of that of recent supervised systems that require more training samples for effective performance.
更多
查看译文
关键词
Instance matching,Semi-supervised learning,Boosting
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要