AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We showed that a poorly-selected property table can result in a factor of 3.8 slowdown over an optimal property table, making the solution difficult to use in practice

Scalable semantic web data management using vertical partitioning

VLDB, pp.411-422, (2007)

Cited by: 939|Views176
EI
Full Text
Bibtex
Weibo

Abstract

Efficient management of RDF data is an important factor in realizing the Semantic Web vision. Performance and scalability issues are becoming increasingly pressing as Semantic Web technology is applied to real-world applications. In this paper, we examine the reasons why current data management solutions for RDF data scale poorly, and exp...More

Code:

Data:

0
Introduction
  • The Semantic Web is an effort by the W3C [8] to enable integration and sharing of data across different applications and organizations.
  • This graph can be represented using XML syntax (RDF/XML)
  • This is typically the format for RDF data exchange; structurally, the graph can be parsed into a series of triples, each representing a statement of the form < sub ject, property, ob ject >, which is the notation the authors follow in this paper.
  • To represent the fact that Serge Abiteboul, Rick Hull, and Victor Vianu wrote a book called “Foundations of Databases” the authors would use seven triples1: person1 isNamed ‘‘Serge Abiteboul’’
Highlights
  • The Semantic Web is an effort by the W3C [8] to enable integration and sharing of data across different applications and organizations
  • The property table and vertical partitioning approaches both perform a factor of 2-3 faster than the triple-store approach (the geometric mean3 of their query times was 38 and 36 seconds respectively compared with 97 seconds for the triple-store approach4
  • The emergence of the Semantic Web necessitates high-performance data management tools to manage the tremendous collections of RDF data being produced
  • The previously proposed “property table” optimization has not been adopted in most RDF databases, perhaps due to its complexity and inability to handle multi-valued attributes
  • We showed that a poorly-selected property table can result in a factor of 3.8 slowdown over an optimal property table, making the solution difficult to use in practice
  • We review the state of the art for improving performance for RDF databases and consider a recent suggestion, “property tables.” We discuss practically and empirically why this solution has undesirable features
  • As an alternative to property tables, we proposed vertically partitioning tables and demonstrated that they achieve similar performance as property tables in a row-oriented database, while being simpler to implement
Results
  • The performance numbers for all seven queries on the four architectures are shown in Figure 3.
  • The property table and vertical partitioning approaches both perform a factor of 2-3 faster than the triple-store approach (the geometric mean3 of their query times was 38 and 36 seconds respectively compared with 97 seconds for the triple-store approach4.
  • The triple-store only performs a factor of two slower since it does not have to perform any joins for this query.
  • The authors noted that the type property table in Postgres takes 472MB compared to just 100MB in C-Store.
  • This is almost entirely due to the fact that the Postgres tuple header is 27 bytes compared with just 8 bytes of actual data per tuple and so the Postgres table scan needs to read 35 bytes per tuple compared with just 8 for C-Store
Conclusion
  • The emergence of the Semantic Web necessitates high-performance data management tools to manage the tremendous collections of RDF data being produced.
  • Current state of the art RDF databases – triple-stores – scale extremely poorly since most queries require multiple self-joins on the triples table.
  • As an alternative to property tables, the authors proposed vertically partitioning tables and demonstrated that they achieve similar performance as property tables in a row-oriented database, while being simpler to implement.
  • The authors showed that on a version of the C-Store column-oriented database, it is possible to achieve a factor of 32 performance improvement over the current state of the art triple store design.
  • Queries that used to take hundreds of seconds can be run in less than ten seconds, a significant step toward interactivetime semantic web content storage and querying
Tables
  • Table1: Some sample RDF data and possible property tables
  • Table2: Query times (in seconds) for Q5 and Q6 after the
  • Table3: Query times in seconds comparing a wider than necessary property table to the property table containing only the columns required for the query. % Slowdown =
Download tables as Excel
Funding
  • This work was supported by the National Science Foundation under grants IIS-048124, CNS0520032, IIS-0325703 and two NSF Graduate Research Fellowships
Reference
  • Library catalog data. http://simile.mit.edu/rdf-test-data/barton/.
    Findings
  • Longwell website. http://simile.mit.edu/longwell/.
    Findings
  • Redland RDF Application Framework. http://librdf.org/.
    Findings
  • Simile website. http://simile.mit.edu/.
    Findings
  • Swoogle. http://swoogle.umbc.edu/.
    Findings
  • Uniprot rdf dataset. http://dev.isb-sib.ch/projects/uniprot-rdf/.
    Findings
  • Wordnet rdf dataset. http://www.cogsci.princeton.edu/∼wn/.
    Findings
  • World Wide Web Consortium (W3C). http://www.w3.org/.
    Findings
  • RDF Primer. W3C Recommendation. http://www.w3.org/TR/rdf-primer, 2004.
    Findings
  • RDQL - A Query Language for RDF. W3C Member Submission 9 January 2004. http://www.w3.org/Submission/RDQL/, 2004.
    Findings
  • SPARQL Query Language for RDF. W3C Working Draft 4 October 2006. http://www.w3.org/TR/rdf-sparql-query/, 2006.
    Findings
  • D. Abadi, A. Marcus, S. Madden, and K. Hollenbach. Using the Barton libraries dataset as an RDF benchmark. Technical Report MIT-CSAIL-TR-2007-036, MIT.
    Google ScholarFindings
  • D. J. Abadi. Column stores for wide and sparse data. In CIDR, 2007.
    Google ScholarLocate open access versionFindings
  • D. J. Abadi, S. Madden, and M. Ferreira. Integrating Compression and Execution in Column-Oriented Database Systems. In SIGMOD, 2006.
    Google ScholarLocate open access versionFindings
  • D. J. Abadi, D. S. Myers, D. J. DeWitt, and S. R. Madden. Materialization strategies in a column-oriented DBMS. In Proc. of ICDE, 2007.
    Google ScholarLocate open access versionFindings
  • R. Agrawal, A. Somani, and Y. Xu. Storage and Querying of E-Commerce Data. In VLDB, 2001.
    Google ScholarLocate open access versionFindings
  • J. Beckmann, A. Halverson, R. Krishnamurthy, and J. Naughton. Extending RDBMSs To Support Sparse Datasets Using An Interpreted Attribute Storage Format. In ICDE, 2006.
    Google ScholarLocate open access versionFindings
  • P. A. Boncz and M. L. Kersten. MIL primitives for querying a fragmented world. VLDB Journal, 8(2):101–119, 1999.
    Google ScholarLocate open access versionFindings
  • P. A. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-pipelining query execution. In CIDR, pages 225–237, 2005.
    Google ScholarLocate open access versionFindings
  • V. Bonstrom, A. Hinze, and H. Schweppe. Storing RDF as a graph. In Proc. of LA-WEB, 2003.
    Google ScholarLocate open access versionFindings
  • J. Broekstra, A. Kampman, and F. van Harmelen. Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. In ISWC, pages 54–68, 2002.
    Google ScholarLocate open access versionFindings
  • E. I. Chong, S. Das, G. Eadon, and J. Srinivasan. An Efficient SQL-based RDF Querying Scheme. In VLDB, pages 1216–1227, 2005.
    Google ScholarLocate open access versionFindings
  • G. P. Copeland and S. N. Khoshafian. A decomposition storage model. In Proc. of SIGMOD, pages 268–279, 1985.
    Google ScholarLocate open access versionFindings
  • J. Corwin, A. Silberschatz, P. L. Miller, and L. Marenco. Dynamic tables: An architecture for managing evolving, heterogeneous biomedical data in relational database management systems. Journal of the American Medical Informatics Association, 14(1):86–93, 2007.
    Google ScholarLocate open access versionFindings
  • D. Florescu and D. Kossmann. Storing and querying XML data using an RDMBS. IEEE Data Eng. Bull., 22(3):27–34, 1999.
    Google ScholarLocate open access versionFindings
  • S. Harris and N. Gibbins. 3store: Efficient bulk RDF storage. In In Proc. of PSSS’03, pages 1–15, 2003.
    Google ScholarLocate open access versionFindings
  • J. M. Hellerstein, J. F. Naughton, and A. Pfeffer. Generalized search trees for database systems. In Proc. of VLDB 1995, Zurich, Switzerland, pages 562–573.
    Google ScholarLocate open access versionFindings
  • R. MacNicol and B. French. Sybase IQ Multiplex - Designed For Analytics. In VLDB, pages 1227–1230, 2004.
    Google ScholarLocate open access versionFindings
  • J. Shanmugasundaram, K. Tufte, C. Zhang, G. He, D. J. DeWitt, and J. F. Naughton. Relational databases for querying XML documents: Limitations and opportunities. In Proc. of VLDB, pages 302–314, 1999.
    Google ScholarLocate open access versionFindings
  • M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. J. O’Neil, P. E. O’Neil, A. Rasin, N. Tran, and S. B. Zdonik. C-Store: A column-oriented DBMS. In VLDB, pages 553–564, 2005.
    Google ScholarLocate open access versionFindings
  • K. Wilkinson. Jena property table implementation. In SSWS, 2006.
    Google ScholarLocate open access versionFindings
  • K. Wilkinson, C. Sayers, H. Kuno, and D. Reynolds. Efficient RDF Storage and Retrieval in Jena2. In SWDB, pages 131–150, 2003.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科