AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
We showed that a poorly-selected property table can result in a factor of 3.8 slowdown over an optimal property table, making the solution difficult to use in practice
Scalable semantic web data management using vertical partitioning
VLDB, pp.411-422, (2007)
Efficient management of RDF data is an important factor in realizing the Semantic Web vision. Performance and scalability issues are becoming increasingly pressing as Semantic Web technology is applied to real-world applications. In this paper, we examine the reasons why current data management solutions for RDF data scale poorly, and exp...More
PPT (Upload PPT)
- The Semantic Web is an effort by the W3C  to enable integration and sharing of data across different applications and organizations.
- This graph can be represented using XML syntax (RDF/XML)
- This is typically the format for RDF data exchange; structurally, the graph can be parsed into a series of triples, each representing a statement of the form < sub ject, property, ob ject >, which is the notation the authors follow in this paper.
- To represent the fact that Serge Abiteboul, Rick Hull, and Victor Vianu wrote a book called “Foundations of Databases” the authors would use seven triples1: person1 isNamed ‘‘Serge Abiteboul’’
- The Semantic Web is an effort by the W3C  to enable integration and sharing of data across different applications and organizations
- The property table and vertical partitioning approaches both perform a factor of 2-3 faster than the triple-store approach (the geometric mean3 of their query times was 38 and 36 seconds respectively compared with 97 seconds for the triple-store approach4
- The emergence of the Semantic Web necessitates high-performance data management tools to manage the tremendous collections of RDF data being produced
- The previously proposed “property table” optimization has not been adopted in most RDF databases, perhaps due to its complexity and inability to handle multi-valued attributes
- We showed that a poorly-selected property table can result in a factor of 3.8 slowdown over an optimal property table, making the solution difficult to use in practice
- We review the state of the art for improving performance for RDF databases and consider a recent suggestion, “property tables.” We discuss practically and empirically why this solution has undesirable features
- As an alternative to property tables, we proposed vertically partitioning tables and demonstrated that they achieve similar performance as property tables in a row-oriented database, while being simpler to implement
- The performance numbers for all seven queries on the four architectures are shown in Figure 3.
- The property table and vertical partitioning approaches both perform a factor of 2-3 faster than the triple-store approach (the geometric mean3 of their query times was 38 and 36 seconds respectively compared with 97 seconds for the triple-store approach4.
- The triple-store only performs a factor of two slower since it does not have to perform any joins for this query.
- The authors noted that the type property table in Postgres takes 472MB compared to just 100MB in C-Store.
- This is almost entirely due to the fact that the Postgres tuple header is 27 bytes compared with just 8 bytes of actual data per tuple and so the Postgres table scan needs to read 35 bytes per tuple compared with just 8 for C-Store
- The emergence of the Semantic Web necessitates high-performance data management tools to manage the tremendous collections of RDF data being produced.
- Current state of the art RDF databases – triple-stores – scale extremely poorly since most queries require multiple self-joins on the triples table.
- As an alternative to property tables, the authors proposed vertically partitioning tables and demonstrated that they achieve similar performance as property tables in a row-oriented database, while being simpler to implement.
- The authors showed that on a version of the C-Store column-oriented database, it is possible to achieve a factor of 32 performance improvement over the current state of the art triple store design.
- Queries that used to take hundreds of seconds can be run in less than ten seconds, a significant step toward interactivetime semantic web content storage and querying
- Table1: Some sample RDF data and possible property tables
- Table2: Query times (in seconds) for Q5 and Q6 after the
- Table3: Query times in seconds comparing a wider than necessary property table to the property table containing only the columns required for the query. % Slowdown =
- This work was supported by the National Science Foundation under grants IIS-048124, CNS0520032, IIS-0325703 and two NSF Graduate Research Fellowships
- Library catalog data. http://simile.mit.edu/rdf-test-data/barton/.
- Longwell website. http://simile.mit.edu/longwell/.
- Redland RDF Application Framework. http://librdf.org/.
- Simile website. http://simile.mit.edu/.
- Swoogle. http://swoogle.umbc.edu/.
- Uniprot rdf dataset. http://dev.isb-sib.ch/projects/uniprot-rdf/.
- Wordnet rdf dataset. http://www.cogsci.princeton.edu/∼wn/.
- World Wide Web Consortium (W3C). http://www.w3.org/.
- RDF Primer. W3C Recommendation. http://www.w3.org/TR/rdf-primer, 2004.
- RDQL - A Query Language for RDF. W3C Member Submission 9 January 2004. http://www.w3.org/Submission/RDQL/, 2004.
- SPARQL Query Language for RDF. W3C Working Draft 4 October 2006. http://www.w3.org/TR/rdf-sparql-query/, 2006.
- D. Abadi, A. Marcus, S. Madden, and K. Hollenbach. Using the Barton libraries dataset as an RDF benchmark. Technical Report MIT-CSAIL-TR-2007-036, MIT.
- D. J. Abadi. Column stores for wide and sparse data. In CIDR, 2007.
- D. J. Abadi, S. Madden, and M. Ferreira. Integrating Compression and Execution in Column-Oriented Database Systems. In SIGMOD, 2006.
- D. J. Abadi, D. S. Myers, D. J. DeWitt, and S. R. Madden. Materialization strategies in a column-oriented DBMS. In Proc. of ICDE, 2007.
- R. Agrawal, A. Somani, and Y. Xu. Storage and Querying of E-Commerce Data. In VLDB, 2001.
- J. Beckmann, A. Halverson, R. Krishnamurthy, and J. Naughton. Extending RDBMSs To Support Sparse Datasets Using An Interpreted Attribute Storage Format. In ICDE, 2006.
- P. A. Boncz and M. L. Kersten. MIL primitives for querying a fragmented world. VLDB Journal, 8(2):101–119, 1999.
- P. A. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-pipelining query execution. In CIDR, pages 225–237, 2005.
- V. Bonstrom, A. Hinze, and H. Schweppe. Storing RDF as a graph. In Proc. of LA-WEB, 2003.
- J. Broekstra, A. Kampman, and F. van Harmelen. Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. In ISWC, pages 54–68, 2002.
- E. I. Chong, S. Das, G. Eadon, and J. Srinivasan. An Efficient SQL-based RDF Querying Scheme. In VLDB, pages 1216–1227, 2005.
- G. P. Copeland and S. N. Khoshafian. A decomposition storage model. In Proc. of SIGMOD, pages 268–279, 1985.
- J. Corwin, A. Silberschatz, P. L. Miller, and L. Marenco. Dynamic tables: An architecture for managing evolving, heterogeneous biomedical data in relational database management systems. Journal of the American Medical Informatics Association, 14(1):86–93, 2007.
- D. Florescu and D. Kossmann. Storing and querying XML data using an RDMBS. IEEE Data Eng. Bull., 22(3):27–34, 1999.
- S. Harris and N. Gibbins. 3store: Efficient bulk RDF storage. In In Proc. of PSSS’03, pages 1–15, 2003.
- J. M. Hellerstein, J. F. Naughton, and A. Pfeffer. Generalized search trees for database systems. In Proc. of VLDB 1995, Zurich, Switzerland, pages 562–573.
- R. MacNicol and B. French. Sybase IQ Multiplex - Designed For Analytics. In VLDB, pages 1227–1230, 2004.
- J. Shanmugasundaram, K. Tufte, C. Zhang, G. He, D. J. DeWitt, and J. F. Naughton. Relational databases for querying XML documents: Limitations and opportunities. In Proc. of VLDB, pages 302–314, 1999.
- M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. J. O’Neil, P. E. O’Neil, A. Rasin, N. Tran, and S. B. Zdonik. C-Store: A column-oriented DBMS. In VLDB, pages 553–564, 2005.
- K. Wilkinson. Jena property table implementation. In SSWS, 2006.
- K. Wilkinson, C. Sayers, H. Kuno, and D. Reynolds. Efficient RDF Storage and Retrieval in Jena2. In SWDB, pages 131–150, 2003.