AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
This paper shows that N-ary Storage Model has a negative effect on data cache performance, and introduces PAX, a new data page layout for relational DBMSs

Weaving Relations for Cache Performance

VLDB, pp.169-180, (2001)

Cited by: 512|Views183
EI
Full Text
Bibtex
Weibo

Abstract

Relational database systems have traditionally optimzed for I/O performance and organized records sequentially on disk pages using the N-ary Storage Model (NSM) (a.k.a., slotted pages). Recent research, however, indicates that cache utilization and performance is becoming increasingly important on modern platforms. In this paper, we first...More

Code:

Data:

Introduction
  • The communication between the CPU and the secondary storage (I/O) has been traditionally recognized as the major database performance bottleneck.
  • To optimize data transfer to and from mass storage, relational DBMSs have long organized records in slotted disk pages using the Nary Storage Model (NSM).
  • NSM stores records contiguously starting from the beginning of each disk page, and uses an offset table at the end of the page to locate the beginning of each record [27].
  • Most queries use only a fraction of each record.
  • To minimize unnecessary I/O, the Decomposition Storage Model (DSM) was proposed in 1985 [10].
  • Queries that involve multiple attributes from a relation, must spend
Highlights
  • The communication between the CPU and the secondary storage (I/O) has been traditionally recognized as the major database performance bottleneck
  • N-ary Storage Model (NSM) stores records contiguously starting from the beginning of each disk page, and uses an offset table at the end of the page to locate the beginning of each record [27]
  • The graph shows that, while the performance of the NSM and PAX schemes are relatively insensitive to the changes, Decomposition Storage Model (DSM)’s performance is very sensitive to the number of attributes used in the query
  • The first part discusses in detail the cache behavior of NSM and PAX, while the second part presents a performance sensitivity analysis for NSM and PAX as the query projectivity and the number of attributes in the predicate and the relation vary
  • This paper shows that NSM has a negative effect on data cache performance, and introduces PAX (Partition Attributes Across), a new data page layout for relational DBMSs
  • When compared to NSM, PAX incurs 75% less data cache stall time, while range selection queries and updates on main-memory tables execute in 17-25% less elapsed time
Methods
  • To store a relation with degree n, PAX partitions each page into n minipages.
  • It stores values of the first attribute in the first minipage, values of the second attribute in the second minipage, and so on.
  • At the end of each F-minipage there is a presence bit vector with one entry per record that denotes null values for nullable attributes
Results
  • The graph shows that, while the performance of the NSM and PAX schemes are relatively insensitive to the changes, DSM’s performance is very sensitive to the number of attributes used in the query.
  • The first part discusses in detail the cache behavior of NSM and PAX, while the second part presents a performance sensitivity analysis for NSM and PAX as the query projectivity and the number of attributes in the predicate and the relation vary
Conclusion
  • Data accesses to the cache hierarchy are a major performance bottleneck for modern database workloads [1].
  • Commercial DBMSs use NSM (N-ary Storage Modelary Storage Model) instead of DSM (Decomposition Storage Model) as the general data placement method, because the latter often incurs high record reconstruction costs.
  • When running TPC-H queries that perform calculations on the data retrieved and require I/O, PAX incurs a 11-48% speedup over NSM.
  • When compared to DSM, PAX cache performance is better and queries execute consistently faster because PAX does not require a join to reconstruct the records
  • When running TPC-H queries that perform calculations on the data retrieved and require I/O, PAX incurs a 11-48% speedup over NSM. When compared to DSM, PAX cache performance is better and queries execute consistently faster because PAX does not require a join to reconstruct the records
Tables
  • Table1: NSM, DSM, and PAX comparison
  • Table2: Effect of the “reorganization worthy” threshold on PAX bulk-loading performance
Download tables as Excel
Related work
  • Several recent workload characterization studies report that database systems suffer from high memory-related processor delays when running on modern platforms. A detailed survey of these studies is provided elsewhere [1][34]. All studies that we are aware of agree that stall time due to data cache misses accounts for 50-70% (OLTP [19]) to 90% (DSS [1]) of the total memory-related stall time, even on architectures where the instruction cache miss rate (i.e., the number of cache misses divided by the number of cache references) is typically higher when executing OLTP workloads [21].

    Research in computer architecture, compilers, and database systems has focused on optimizing data placement for cache performance. A compiler-directed approach for cache-conscious data placement profiles a program and applies heuristic algorithms to find a placement solution that optimizes cache utilization [6]. Clustering, compression, and coloring are techniques that can be applied manually by programmers to improve cache per-
Funding
  • Mark Hill is supported in part by the National Science Foundation under grant EIA-9971256 and through donations from Intel Corporation and Sun Microsystems
Reference
  • A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. DBMSs on a modern processor: Where does time go?. In proceedings of the 25th International Conference on Very Large Data Bases (VLDB), pp. 54-65, Edinburgh, UK, September 1999.
    Google ScholarLocate open access versionFindings
  • A. Ailamaki and D. Slutz. Processor Performance of Selection Queries, Microsoft Research Technical Report MSRTR-99-94, August 1999.
    Google ScholarFindings
  • A. Ailamaki, D. J. DeWitt, and M.D. Hill. Walking Four Machines By The Shore. In Proceedings of the Fourth Workshop on Computer Architecture Evaluation using Commercial Workloads, January 2001.
    Google ScholarLocate open access versionFindings
  • P. Boncz, S. Manegold, and M. Kersten. Database Architecture Optimized for the New Bottleneck: Memory Access. In proceedings of the 25th International Conference on Very Large Data Bases (VLDB), pp. 266-277, Edinburgh, UK, September 1999.
    Google ScholarLocate open access versionFindings
  • T. Brinkhoff, H.-P. Kriegel, R. Schneider, and B. Seeger. Multi-Step Processing of Spatial Joins. In proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 197--208, Minneapolis, MN, May 1994.
    Google ScholarLocate open access versionFindings
  • B. Calder, C. Krintz, S. John, and T. Austin. Cache-Conscious Data-Placement. In Proceedings of the 8th Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VIII), pages 139-149, October 1998.
    Google ScholarLocate open access versionFindings
  • M. Carey, D. J. DeWitt, M. Franklin, N. Hall, M. McAuliffe, J. Naughton, D. Schuh, M. Solomon, C. Tan, O. Tsatalos, S. White, and M. Zwilling, Shoring Up Persistent Applications. In proceedings of the ACM SIGMOD Conference on Management of Data, Minneapolis, MN, May 1994.
    Google ScholarLocate open access versionFindings
  • T. M. Chilimbi, J. R. Larus and M. D. Hill. Making PointerBased Data Structures Cache Conscious. IEEE Computer, December 2000.
    Google ScholarLocate open access versionFindings
  • Compaq Corporation. 21164 Alpha Microprocessor Reference Manual. Online Compaq reference library at http://www.support.compaq.com/alpha-tools/documentation/current/chip-docs.html. Doc. No. EC-QP99C-TE, December 1998.
    Findings
  • G. P. Copeland and S. F. Khoshafian. A Decomposition Storage Model. In proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 268-279, May 1985.
    Google ScholarLocate open access versionFindings
  • D. W. Cornell and P. S. Yu. An Effective Approach to Vertical Partitioning for Physical Design of Relational Databases. In IEEE Transactions on Software engineering, 16(2), February 1990.
    Google ScholarLocate open access versionFindings
  • D.J. DeWitt, N. Kabra, J. Luo, J. Patel, and J. Yu. ClientServer Paradise. In Proceedings of the 20th VLDB International Conference, Santiago, Chile, September 1994.
    Google ScholarLocate open access versionFindings
  • J. Goldstein, R. Ramakrishnan, and U. Shaft. Compressing Relations and Indexes. In proceedings of IEEE International Conference on Data Engineering, 1998.
    Google ScholarLocate open access versionFindings
  • G. Graefe. Iterators, Schedulers, and Distributed-memory Parallelism. In software, Practice and Experience, 26(4), pp. 427-452, April 1996.
    Google ScholarLocate open access versionFindings
  • Jim Gray. The benchmark handbook for transactionprocessing systems. Morgan-Kaufmann Publishers, 2nd edition, 1993.
    Google ScholarFindings
  • A. Guttman. R-Trees: A Dynamic Index Structure for Spatial Searching, In Proceedings of the ACM SIGMOD International Conference on Management of Data, 1984.
    Google ScholarLocate open access versionFindings
  • J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, 2nd edition, 1996.
    Google ScholarFindings
  • Intel Corporation. Pentium® II processor developer's manual. Intel Corporation, Order number 243502-001, October 1997.
    Google ScholarFindings
  • K. Keeton, D. A. Patterson, Y. Q. He, R. C. Raphael, and W. E. Baker. Performance characterization of a quad Pentium pro SMP using OLTP workloads. In Proceedings of the 25th International Symposium on Computer Architecture, Barcelona, Spain, June 1998.
    Google ScholarLocate open access versionFindings
  • Bruce Lindsay. Personal Communication, February / July 2000.
    Google ScholarFindings
  • J. L. Lo, L. A. Barroso, S. J. Eggers, K. Gharachorloo, H. M. Levy, and S. S. Parekh. An analysis of database workload performance on simultaneous multithreaded processors. In Proceedings of the 25th International Symposium on Computer Architecture, June 1998.
    Google ScholarLocate open access versionFindings
  • C. Mohan, D. Haderle, B. Lindsay, H. Pirahesh and P. Schwarz. ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using writeahead logging. In ACM Transactions on Database Systems 17, 1 (March 1992), pp. 94 - 162.
    Google ScholarLocate open access versionFindings
  • M. Nakayama, M. Kitsuregawa, and M. Takagi: Hash-Partitioned Join Method Using Dynamic Destaging Strategy. In Proceedings of the 14th VLDB International Conference, September 1988.
    Google ScholarLocate open access versionFindings
  • S. Navathe, S. Ceri, G. Wiederhold, and J. Dou. Vertical partitioning algorithms for database design. ACM Transactions on Database Systems, 9(4), pp/ 680-710, December 1984.
    Google ScholarLocate open access versionFindings
  • P. O’Neil and D. Quass. Improved Query Performance With Variant Indexes. In proceedings of the ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, May 1997.
    Google ScholarLocate open access versionFindings
  • J. M. Patel and D. J. DeWitt. Partition Based Spatial-Merge Join. In proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 259-270, Montreal, Canada, June 1996.
    Google ScholarLocate open access versionFindings
  • R. Ramakrishnan and J. Gehrke. Database Management Systems. WCB/McGraw-Hill, 2nd edition, 2000.
    Google ScholarFindings
  • P. G. Selinger, M. M. Astrahan, D. D. Chamberlain, R.A. Lorie, and T. G. Price. Access Path Selection In A Relational Database Management System. In Proceedings of the ACM SIGMOD Conference on Management of Data, 1979.
    Google ScholarLocate open access versionFindings
  • A. Shatdal, C. Kant, and J. Naughton. Cache Conscious Algorithms for Relational Query Processing. In proceedings of the 20th International Conference on Very Large Data Bases (VLDB), pp. 510-512, September 1994.
    Google ScholarLocate open access versionFindings
  • R. Soukup and K. Delaney. Inside SQL Server 7.0. Microsoft Press, 1999.
    Google ScholarFindings
  • Sun Microelectronics. UltraSparcTM Reference Manual. Online Sun reference library at http://www.sun.com/microelectronics/manuals/ultrasparc/802-7220-02.pdf, July 1997.[32] http://technet.oracle.com/docs/products/oracle8i/doc_index.htm
    Findings
  • [33] http://www.sybase.com/products/archivedproducts/sybaseiq
    Findings
  • [34] http://www.cs.wisc.edu/~natassa/papers/PAX_full.pdf
    Findings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科