Unsupervised Entity Resolution Method Based on Random Forest

WEB INFORMATION SYSTEMS AND APPLICATIONS (WISA 2021)(2021)

引用 1|浏览4
暂无评分
摘要
The task of entity resolution is to find records that describe the same entity in the real world, so as to solve the problem of data duplication. This paper proposes an unsupervised entity resolution method based on machine learning. This method first uses LSTM to convert records into vectors with semantic information. Next, we use the improved random forest method to map the records into the n-dimensional space to realize the partition operation of the records, and consider that the records in the same partition point to the same entity. Finally, we use an improvedAffinity Propagation Clustering (AP) to cluster the partitions to determine whether the records in different partitions point to the same entity. Through experiments on real data sets, the effectiveness of the algorithm for solving entity resolution tasks is proved.
更多
查看译文
关键词
Entity resolution, Random forest, LSTM, Affinity Propagation, Clustering
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要