The missing links: discovering hidden same-as links among a billion of triples

iiWAS '10: Proceedings of the 12th International Conference on Information Integration and Web-based Applications & Services(2010)

引用 29|浏览0
The Semantic Web is constantly gaining momentum, as more and more Web sites and content providers adopt its principles. At the core of these principles lies the Linked Data movement, which demands that data on the Web shall be annotated and linked among different sources, instead of being isolated in data silos. In order to materialize this vision of a web of semantics, existing resource identifiers should be reused and shared between different Web sites. This is not always the case with the current state of the Semantic Web, since multiple identifiers are, more often than not, redundantly introduced for the same resources. In this paper we introduce a novel approach to automatically detect redundant identifiers solely by matching the URIs of information resources. The approach, based on a common pattern among Semantic Web URIs, provides a simple and practical method for duplicate detection. We apply this method on a large snapshot of the current Semantic Web comprising 1.15 billion statements and estimate the number of hidden duplicates in it. The outcomes of our experiments confirm the effectiveness as well as the efficiency of our method, and suggest that URI matching can be used as a scalable filter for discovering implicit same-as links.
redundant identifiers,practical method,multiple identifiers,semantic web uris,missing link,hidden same-as link,current semantic web,uri matching,semantic web,web site,existing resource identifiers,different web site,linked data,data integrity,information integration,entity resolution
AI 理解论文
Chat Paper