Incremental Clustering on Linked Data.

ICDM Workshops(2018)

引用 21|浏览20
暂无评分
摘要
Data integration in the Web of Data is not limited to the pairwise linking of entities but often requires to cluster entities of different sources, e. g., within knowledge graphs. Such entity clustering should not only be scalable to large data volumes and many sources but also be dynamic to deal with continuously changing sources and the ability to incorporate new sources. Previous entity clustering approaches are mostly static focusing on the one-time linking and clustering of entities from few sources. In this paper, we propose and evaluate new scalable approaches for incremental entity clustering that support the continuous addition of new entities and data sources. The implementation is based on the distributed processing framework Apache Flink. A detailed performance evaluation with real and synthetically customized datasets shows the effectiveness and scalability of the incremental clustering approaches.
更多
查看译文
关键词
Runtime,Scalability,Conferences,Distributed processing,Urban areas,Clustering algorithms,Data mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要