AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
To realize our vision of abstraction without regret, we present LegoBase, a query engine written in the high-level programming language Scala

Building efficient query engines in a high-level language

PVLDB, no. 10 (2014): 853-864

Cited by: 122|Views188
EI

Abstract

In this paper we advocate that it is time for a radical rethinking of database systems design. Developers should be able to leverage high-level programming languages without having to pay a price in efficiency. To realize our vision of abstraction without regret, we present LegoBase, a query engine written in the high-level programming la...More

Code:

Data:

0
Introduction
  • Software specialization is becoming increasingly important for overcoming performance issues in complex software systems [25].
  • In the context of database management systems, it has been noted that query engines do not, to date, match the performance of handwritten code [33].
  • All previous query compilers are based on code template expansion, a technique that generates code directly, in one step, from the query plan by replacing each operator node by its code template.
  • In its purest form, template expansion makes cross-operator code optimization inside the query compiler impossible
Highlights
  • Software specialization is becoming increasingly important for overcoming performance issues in complex software systems [25]
  • We evaluate our approach with the TPC-H benchmark and show that: (a) with all optimizations enabled, our architecture significantly outperforms a commercial in-memory database system as well as an existing query compiler, (b) these performance improvements require programming just a few hundred lines of high-level code instead of complicated low-level code that is required by existing query compilers and, that (c) the compilation overhead is low compared to the overall execution time, making our approach usable in practice for efficiently compiling query engines
  • All previous query compilers are based on code template expansion, a technique that generates code directly, in one step, from the query plan by replacing each operator node by its code template
  • We configure the Java Virtual Machine to run with 192GB of heap space
  • Our approach admits a productivity/efficiency combination that is not feasible with existing low-level query compilers: Programmers need to develop just a few hundred lines of high-level code to implement techniques and optimizations that result in significant performance improvements
  • Our experiments show that LegoBase significantly outperforms both a commercial in-memory database and an existing query compiler
Results
  • The authors' experimental platform consists of a server-type x86 machine equipped with two Intel Xeon E5-2620 v2 CPUs running at 2GHz each, 256GB of DDR3 RAM at 1600Mhz and two commodity hard disks of 2TB storing the experimental datasets.
  • TPC-H is a data-warehousing and decision support benchmark that issues business analytics queries to a database with sales information.
  • This benchmark suite includes 22 queries with a high degree of complexity that express most SQL features.
  • As a reference point for all results presented the authors use a commercial, in-memory, row-store database system called DBX, which does not employ compilation.
  • As described in Section 2, LegoBase uses query plans from the DBX database
Conclusion
  • LegoBase is a new analytical database system currently under development at EPFL. In this paper, the authors presented the current prototype of the query execution subsystem of LegoBase.
  • The authors' system allows programmers to develop high-level abstractions without having to pay an abstraction penalty
  • To achieve this vision of abstraction without regret, LegoBase performs source-to-source compilation of the high-level Scala code to very efficient low-level C code.
  • It uses state-of-the-art compiler technology in the form of an extensible staging compiler implemented as a library in which optimizations can be expressed naturally at a high level.
  • The authors' experiments show that LegoBase significantly outperforms both a commercial in-memory database and an existing query compiler
Tables
  • Table1: Programming effort required for each LegoBase component along with the average speedup obtained from using it
Download tables as Excel
Related work
  • We outline related work in three areas: (a) Previous query compilers, (b) Frameworks for applying intra-operator optimizations and, finally, (c) Orthogonal techniques to speed-up query processing. We briefly discuss these areas below.

    Previous Compilation Frameworks. Historically, System R [2] first proposed code generation for query optimization. However, the Volcano iterator model eventually dominated over compilation, since code generation was very expensive to maintain. The Daytona [5] system revisited compilation in the late nineties, however it heavily relied on the operating system for functionality that is traditionally provided by the DBMS itself, like buffering.
Funding
  • This work was supported by ERC grant 279804
Reference
  • D. J. Abadi, S. Madden, and N. Hachem. Column stores vs. Row stores: How Different Are They Really? In ACM SIGMOD, pages 967–980, 2008.
    Google ScholarLocate open access versionFindings
  • D. D. Chamberlin, M. M. Astrahan, M. W. Blasgen, J. N. Gray, W. F. King, B. G. Lindsay, R. Lorie, J. W. Mehl, T. G. Price, F. Putzolu, P. G. Selinger, M. Schkolnick, D. R. Slutz, I. L. Traiger, B. W. Wade, and R. A. Yost. A history and evaluation of System R. Comm. ACM, 24(10):632–646, 1981.
    Google ScholarLocate open access versionFindings
  • F. Färber, S. K. Cha, J. Primsch, C. Bornhövd, S. Sigg, and W. Lehner. SAP HANA: data management for modern business applications. SIGMOD Record, 40(4):45–51, 2012.
    Google ScholarLocate open access versionFindings
  • G. Graefe. Volcano-an extensible and parallel query evaluation system. IEEE Transactions on Knowledge and Data Engineering, 6(1):120–135, 1994.
    Google ScholarLocate open access versionFindings
  • R. Greer. Daytona and the fourth-generation language Cymbal. In ACM SIGMOD, pages 525–526, 1999.
    Google ScholarLocate open access versionFindings
  • L. M. Haas, J. C. Freytag, G. M. Lohman, and H. Pirahesh. Extensible Query Processing in Starburst. In ACM SIGMOD, pages 377–388, 1989.
    Google ScholarLocate open access versionFindings
  • S. Harizopoulos, V. Liang, D. J. Abadi, and S. Madden. Performance tradeoffs in read-optimized databases. In VLDB, pages 487–498, 2006.
    Google ScholarLocate open access versionFindings
  • G. C. Hunt and J. R. Larus. Singularity: Rethinking the Software Stack. SIGOPS Oper. Syst. Rev., 41(2):37–49, 2007.
    Google ScholarLocate open access versionFindings
  • R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. C. Jones, S. Madden, M. Stonebraker, Y. Zhang, J. Hugg, and D. J. Abadi. H-Store: a high-performance, distributed main memory transaction processing system. PVLDB, 1(2):1496–1499, 2008.
    Google ScholarLocate open access versionFindings
  • C. Koch. Abstraction without regret in data management systems. In CIDR, 2013.
    Google ScholarLocate open access versionFindings
  • C. Koch. Abstraction without regret in database systems building: a manifesto. IEEE Data Eng. Bull., 37(1):70–79, 2014.
    Google ScholarLocate open access versionFindings
  • K. Krikellas, S. Viglas, and M. Cintra. Generating code for holistic query evaluation. In ICDE, pages 613–624, 2010.
    Google ScholarLocate open access versionFindings
  • C. Lattner. LLVM: An Infrastructure for Multi-Stage Optimization. http://llvm.org/.
    Findings
  • S. Manegold, M. L. Kersten, and P. Boncz. Database architecture evolution: mammals flourished long before dinosaurs became extinct. PVLDB, 2(2):1648–1653, 2009.
    Google ScholarLocate open access versionFindings
  • T. Neumann. Efficiently Compiling Efficient Query Plans for Modern Hardware. PVLDB, 4(9):539–550, 2011.
    Google ScholarLocate open access versionFindings
  • M. Odersky and M. Zenger. Scalable Component Abstractions. In OOPSLA, pages 41–57, 2005.
    Google ScholarLocate open access versionFindings
  • Oracle Corporation. TimesTen Database Architecture. http://download.oracle.com/otn_hosted_doc/timesten/603/TimesTen-Documentation/arch.pdf.
    Findings
  • S. Padmanabhan, T. Malkemus, A. Jhingran, and R. Agarwal. Block oriented processing of relational database operations in modern computer architectures. In ICDE, pages 567–574, 2001.
    Google ScholarLocate open access versionFindings
  • V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Sidle. Constant-Time Query Processing. In ICDE, pages 60–69, 2008.
    Google ScholarLocate open access versionFindings
  • J. Rao, H. Pirahesh, C. Mohan, and G. Lohman. Compiled Query Execution Engine using JVM. In ICDE, pages 23–, 2006.
    Google ScholarLocate open access versionFindings
  • T. Rompf and M. Odersky. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs. In Generative Programming and Component Engineering, pages 127–136, 2010. http://scala-lms.github.io/.
    Locate open access versionFindings
  • T. Rompf, A. K. Sujeeth, N. Amin, K. J. Brown, V. Jovanovic, H. Lee, M. Jonnalagedda, K. Olukotun, and M. Odersky. Optimizing data structures in high-level programs: new directions for extensible compilers based on staging. In POPL, pages 497–510, 2013.
    Google ScholarLocate open access versionFindings
  • J. Sompolski, M. Zukowski, and P. Boncz. Vectorization vs. compilation in query execution. In DaMoN, pages 33–40, 2011.
    Google ScholarLocate open access versionFindings
  • M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O’Neil, P. O’Neil, A. Rasin, N. Tran, and S. Zdonik. C-Store: A Column-oriented DBMS. In VLDB, pages 553–564, 2005.
    Google ScholarLocate open access versionFindings
  • M. Stonebraker and U. Cetintemel. "One Size Fits All": An Idea Whose Time Has Come and Gone. In ICDE, pages 2–11, 2005.
    Google ScholarLocate open access versionFindings
  • M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The end of an architectural era: (it’s time for a complete rewrite). In VLDB, pages 1150–1160, 2007.
    Google ScholarLocate open access versionFindings
  • W. Taha and T. Sheard. MetaML and multi-stage programming with explicit annotations. Theor. Comput. Sci., 248(1-2):211–242, 2000.
    Google ScholarLocate open access versionFindings
  • Transaction Processing Performance Council. TPC-H, a decision support benchmark. http://www.tpc.org/tpch.
    Findings
  • B. M. Zane, J. P. Ballard, F. D. Hinshaw, D. A. Kirkpatrick, and L. Premanand Yerabothu. Optimized SQL code generation, 2008. US Patent 7430549 B2.
    Google ScholarFindings
  • R. Zhang, S. Debray, and R. T. Snodgrass. Micro-specialization: dynamic code specialization of database management systems. In Code Generation and Optimization, pages 63–73, 2012.
    Google ScholarLocate open access versionFindings
  • R. Zhang, R. Snodgrass, and S. Debray. Application of Micro-specialization to Query Evaluation Operators. In ICDE Workshops, pages 315–321, 2012.
    Google ScholarLocate open access versionFindings
  • R. Zhang, R. Snodgrass, and S. Debray. Micro-Specialization in DBMSes. In ICDE, pages 690–701, 2012.
    Google ScholarLocate open access versionFindings
  • M. Zukowski, P. A. Boncz, N. Nes, and S. HÃl’man. MonetDB/X100 - A DBMS In The CPU Cache. IEEE Data Eng. Bull., (2):17–22, 2005.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科