Efficient computation of comprehensive statistical information of large OWL datasets: a scalable approach

ENTERPRISE INFORMATION SYSTEMS(2023)

引用 2|浏览18
暂无评分
摘要
Computing dataset statistics is crucial for exploring their structure, however, it becomes challenging for large-scale datasets. This has several key benefits, such as link target identification, vocabulary reuse, quality analysis, big data analytics, and coverage analysis. In this paper, we present the first attempt of developing a distributed approach (OWLStats) for collecting comprehensive statistics over large-scale OWL datasets. OWLStats is a distributed in-memory approach for computing 50 statistical criteria for OWL datasets utilizing Apache Spark. We have successfully integrated OWLStats into the SANSA framework. Experiments results prove that OWLStats is linearly scalable in terms of both node and data scalability.
更多
查看译文
关键词
Distributed processing, in-memory approach, SANSA framework, scalable architecture, Semantic Web, statistics computations
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要