An unsupervised approach for acquiring ontologies and RDF data from online life science databases

SEMANTIC WEB: RESEARCH AND APPLICATIONS, PT 2, PROCEEDINGS(2010)

引用 3|浏览0
暂无评分
摘要
In the Linked Open Data cloud one of the largest data sets, comprising of 2.5 billion triples, is derived from the Life Science domain. Yet this represents a small fraction of the total number of publicly available data sources on the Web. We briefly describe past attempts to transform specific Life Science sources from a plethora of open as well as proprietary formats into RDF data. In particular, we identify and tackle two bottlenecks in current practice: Acquiring ontologies to formally describe these data and creating “RDFizer” programs to convert data from legacy formats into RDF. We propose an unsupervised method, based on transformation rules, for performing these two key tasks, which makes use of our previous work on unsupervised wrapper induction for extracting labelled data from complete Life Science Web sites. We apply our approach to 13 real-world online Life Science databases. The learned ontologies are evaluated by domain experts as well as against gold standard ontologies. Furthermore, we compare the learned ontologies against ontologies that are “lifted” directly from the underlying relational schema using an existing unsupervised approach. Finally, we apply our approach to three online databases to extract RDF data. Our results indicate that this approach can be used to bootstrap and speed up the migration of life science data into the Linked Open Data cloud.
更多
查看译文
关键词
online life science databases,life science domain,rdf data,existing unsupervised approach,largest data set,labelled data,available data source,life science databases,linked open data cloud,life science data,complete life science web,gold standard,linked open data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要