RADON2 - a buffered-intersection matrix computing approach to accelerate link discovery over geo-spatial RDF knowledge bases - OAEI2018 results.

OM@ISWC(2018)

引用 23|浏览45
暂无评分
摘要
Geospatial data is at the essence of the Semantic Web, where a knowledge base such as LinkedGeoData consists of more than 30 billions facts. Reasoning on these considerable amounts of geospatial data lacks efficient methods for the computation of links between the resources contained in these knowledge bases. In this paper, we present the participation of the extension of Radon algorithm (dubbed Radon2) in the OAEI 2018 campaign. The OAEI results show that Radon2 outperforms the other state of the art in most of the cases. 1 Presentation of the System we present the extension of Radon algorithm [8, 6] (dubbed Radon2), where we, compute all topological relations of DE9-IM in order to accelerate the topological relation discovery among geospatial resources. 1.1 State, Purpose and General Statement In the following, we start by formally defining the general link discovery problem. Thereafter, we formally define the link discovery of topological relations problem, which we takeld by Radon2. Link Discovery. Let K be a finite RDF knowledge base. K can be regarded as a set of triples (s, p, o) ∈ (R ∪ B) × P × (R ∪ L ∪ B), where R is the set of all resources, B is the set of all blank nodes, P the set of all predicates and L the set of all literals. The Link Discovery (LD) problem can be expressed as follows: Given two sets of resources S and T (for example hotels and water bodies) and a relation r (e.g., :touches), find all pairs (s, t) ∈ S × T such that r(s, t) holds. The result is produced as a set of links called a mapping : MS ,T = {(si, r, tj)|si ∈ S, tj ∈ T}. Optionally, a similarity score (sim ∈ [0, 1]) calculated by an LD tool can be added to the entries of mappings to express assurance of a computed link. Finding solutions for the LD problem is challenging due to the typically the large volume of current datasets as well as its semantic heterogeneity. The main purpose of LD approaches is to meet the main requirements of (1) high effectiveness (i.e maximize a fitness function such as F-measure) and (2) high efficiency (i.e., minimize runtime). Link Discovery of Topological Relations. The Dimensionally Extended nine-Intersection Model (DE9IM) [3] is a topological model and a standard used to describe the spatial relations of two geometries in two-dimensional space. Since the spatial relations expressed by DE-9IM are topological, they are invariant to rotation, translation and scaling transformations [4]. The DE-9IM model is based on a 3× 3 intersection matrix with the form: DE9IM(g1, g2) =  dim(I(g1) ∩ I(g2)) dim(I(g1) ∩B(g2)) dim(I(g1) ∩ E(g2)) dim(B(g1) ∩ I(g2)) dim(B(g1) ∩B(g2)) dim(B(g1) ∩ E(g2)) dim(E(g1) ∩ I(g2)) dim(E(g1) ∩B(g2)) dim(E(g1) ∩ E(g2))  (1) where dim is the maximum number of dimensions of the intersection ∩ of the interior(I), boundary(B), or exterior(E) of the two geometries g1 and g2. The domain of dim is {−1, 0, 1, 2}, where −1 indicates no intersection, 0 stands for an intersection that results in a set of one or more points, 1 indicates an intersection made up of lines and 2 stands for an intersection that results in an area. A simplified binary version of dim(x) with the binary domain {true, false} is obtained using the Boolean function β(dim(I(g)) = false iff dim(I(g)) = −1 and true otherwise. There is only a subset of the topological relations obtainable through DE-9IM that reflects the semantics of the English language [3] [2] including equals, within, contains, disjoint, touches, meets, covers, coveredBy, intersects, crosses and overlaps. 1.2 Specific Techniques Used in this section, we discuss the main idea behind our new extension of Radon. Radon2 vs. Radon. The basic idea behind the original Radon approach [8] for topological relation discovery is to provide an indexing method combined with space tiling that allows for efficient computation of topological relations between geospatial resources. In particular, Radon presents a novel sparse index for geospatial resources. Then, based on bounding boxes of the indexed geospatial resources, Radon applies a strategy for discarding unnecessary computations of DE-9IM relations. In Radon2, our concerns is focused on optimizing the computing of intersection matrix (IM) used in DE9-IM standard. In the original Radon, the intersection matrix is computed for each topological relation, while in Radon2 we compute the IM once for all relations among the same pair of resources. We then apply the mask for each relation to the the computed IM. In particular, we buffer the IM of each pair of geometries so that all topological relations of same pair can be retrieved with no need to recompute their respective IM again. By applying this strategy, we can save the time for recomputing the IM for each individual topological relation. Moreover, calculating IM at once for each pair of geometries for all topological relations does not affect the completeness of the linking result. i.e., the F-measure of Radon2 is the same as the F-measure of Radon, which is always 1. 1.3 Adaptations Made for the Evaluation No specific adaptations were made to the original Radon algorithm, we only provide a Java SystemAdapter according to the campaign guidelines. 3 https://project-hobbit.eu/challenges/om2017/om2017-tasks/ 1.4 Link to the System Both Radon and Radon2 are implemented in the link discovery framework Limes. Limes is available under the GNU Affero General Public License v3.0 . Radon2 source code is available online from the project website. The project web site also provide a user manual as well as a developer manual.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要