## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Compressed Linear Algebra for Large-Scale Machine Learning.

PVLDB, no. 12 (2018): 960-971

EI

Full Text

Weibo

Abstract

Large-scale machine learning (ML) algorithms are often iterative, using repeated read-only data access and I/O-bound matrix-vector multiplications to converge to an optimal model. It is crucial for performance to fit the data into single-node or distributed main memory. General-purpose, heavy- and lightweight compression techniques strugg...More

Code:

Data:

Introduction

- Data has become a ubiquitous resource [16]. Large-scale machine learning (ML) leverages these large data collections in order to find interesting patterns and build robust predictive models [16, 19].
- SystemML [21] aims at declarative ML [12], where algorithms are expressed in a high level scripting language having an R-like syntax and compiled to hybrid runtime plans that combine both single-node, in-memory operations and distributed operations on MapReduce or Spark [28].
- Information about data size and sparsity are propagated from the inputs through the entire program to enable worst-case memory estimates per operation
- These estimates are used during an operator-selection step, yielding a DAG of low-level operators, which is compiled into a runtime program of executable instructions

Highlights

- Data has become a ubiquitous resource [16]
- Declarative machine learning (ML): State-of-the-art, large-scale ML aims at declarative ML algorithms [12], expressed in high-level languages, which are often based on linear algebra, i.e., matrix multiplications, aggregations, element-wise and statistical operations
- SystemML [21] aims at declarative ML [12], where algorithms are expressed in a high level scripting language having an R-like syntax and compiled to hybrid runtime plans that combine both single-node, in-memory operations and distributed operations on MapReduce or Spark [28]
- compressed linear algebra (CLA) shows similar operations performance at significantly better compression ratios which is crucial for end-to-end performance improvements of large-scale ML
- We have initiated work on compressed linear algebra (CLA), in which matrices are compressed with lightweight techniques and linear algebra operations are performed directly over the compressed representation
- Our experiments show operations performance close to the uncompressed case and compression ratios similar to heavyweight formats like Gzip but better than lightweight formats like Snappy, providing significant performance benefits when data does not fit into memory

Methods

- The major insights are: Operations Performance: CLA achieves in-memory matrix-vector multiply performance close to uncompressed.
- Sparse-safe scalar and aggregate operations show huge improvements due to value-based computation.
- Compression Ratio: CLA yields substantially better compression ratios than lightweight general-purpose compression.
- CLA provides large end-to-end performance improvements, of up to 26x, when uncompressed or lightweight-compressed matrices do not fit in memory.
- Effective Compression Planning: Sampling-based compression planning yields both reasonable compression time and good compression plans, i.e., good choices of encoding formats and co-coding schemes.
- The authors obtain good compression ratios at costs that are amortized

Results

- The differences of compressed sizes were less than 2.5% in all cases.
- CLA shows similar operations performance at significantly better compression ratios which is crucial for end-to-end performance improvements of large-scale ML.
- As long as the data fits in aggregated memory (Mnist80m, 180 GB), all runtimes are almost identical, with Snappy and CLA showing overheads of up to 25% and 10%, respectively.
- For non-iterative algorithms, CLA is up to 32% slower while Snappy shows less than 12% overhead

Conclusion

- The authors have initiated work on compressed linear algebra (CLA), in which matrices are compressed with lightweight techniques and linear algebra operations are performed directly over the compressed representation.
- CLA generalizes sparse matrix representations, encoding both dense and sparse matrices in a universal compressed form.
- CLA is broadly applicable to any system that provides blocked matrix representations, linear algebra, and physical data independence.
- Interesting future work includes (1) full optimizer integration, (2) global planning and physical design tuning, (3) alternative compression schemes, and (4) operations beyond matrix-vector

- Table1: Compression Ratios of Real Datasets
- Table2: Overview ML Algorithm Core Operations
- Table3: Compression Plans of Individual Datasets
- Table4: Compression Ratio CSR-VI vs. CLA
- Table5: Size Estimation Accuracy (Average ARE). Dataset Higgs Census Covtype ImageNet Mnist8m
- Table6: Mnist8m Deserialized RDD Storage Size
- Table7: End-to-End Performance Mnist40m/240m. Algorithm Mnist40m (90 GB) Mnist240m (540 GB)
- Table8: End-to-End Performance ImageNet15/150. Algorithm ImageNet15 (65 GB) ImageNet150 (650 GB)

Related work

- We generalize sparse matrix representations via compression and accordingly review related work of database compression, sparse linear algebra, and compression planning.

Compressed Databases: The notion of compressing databases appears in the literature back in the early 1980s [5, 18], although most early work focuses on the use of generalpurpose techniques like Huffman coding. An important exception is the Model 204 database system, which used compressed bitmap indexes to speed up query processing [38]. More recent systems that use bitmap-based compression include FastBit [49], Oracle [39], and Sybase IQ [45]. Graefe and Shapiro’s 1991 paper “Data Compression and Database Performance” more broadly introduced the idea of compression to improve query performance by evaluating queries in the compressed domain [23], primarily with dictionary-based compression. Westmann et al explored storage, query processing and optimization with regard to lightweight compression techniques [47]. Later, Raman and Swart investigated query processing over heavyweight Huffman coding schemes [40], where they have also shown the benefit of column co-coding. Recent examples of relational database systems that use multiple types of compression to speed up query processing include C-Store/Vertica [43], SAP HANA [11], IBM DB2 with BLU Acceleration [41], Microsoft SQL Server [36], and HyPer [35]. SciDB—as an array database— also uses compression but decompressed arrays block-wise for each operation [44]. Further, Kimura et al made a case for compression-aware physical design tuning to overcome suboptimal design choices [33], which requires to estimate sizes of compressed indexes. Existing estimators focus on compression schemes such as null suppression and dictionary encoding [29], where the latter is again related to estimating the number of distinct values. Other estimators focus on index layouts such as RID list and prefix key compression [9].

Reference

- [2] A. Alexandrov et al. The Stratosphere Platform for Big Data Analytics. VLDB J., 23(6), 2014.
- [3] A. Ashari et al. An Efficient Two-Dimensional Blocking Strategy for Sparse Matrix-Vector Multiplication on GPUs. In ICS (Intl. Conf. on Supercomputing), 2014.
- [4] A. Ashari et al. On Optimizing Machine Learning Workloads via Kernel Fusion. In PPoPP (Principles and Practice of Parallel Programming), 2015.
- [6] N. Bell and M. Garland. Implementing Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors. In SC (Supercomputing Conf.), 2009.
- [7] J. Bergstra et al. Theano: a CPU and GPU Math Expression Compiler. In SciPy, 2010.
- [8] K. S. Beyer et al. On Synopses for Distinct-Value Estimation Under Multiset Operations. In SIGMOD, 2007.
- [9] B. Bhattacharjee et al. Efficient Index Compression in DB2 LUW. PVLDB, 2(2), 2009.
- [12] M. Boehm et al. Declarative Machine Learning – A Classification of Basic Properties and Types. CoRR, 2016.
- [14] M. Charikar et al. Towards Estimation Error Guarantees for Distinct Values. In SIGMOD, 2000.
- [15] R. Chitta et al. Approximate Kernel k-means: Solution to Large Scale Kernel Clustering. In KDD, 2011.
- [16] J. Cohen et al. MAD Skills: New Analysis Practices for Big Data. PVLDB, 2(2), 2009.
- [17] C. Constantinescu and M. Lu. Quick Estimation of Data Compression and De-duplication for Large Storage Systems. In CCP (Data Compression, Comm. and Process.), 2011.
- [19] S. Das et al. Ricardo: Integrating R and Hadoop. In SIGMOD, 2010.
- [20] J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, 2004.
- [21] A. Ghoting et al. SystemML: Declarative Machine Learning on MapReduce. In ICDE, 2011.
- [23] G. Graefe and L. D. Shapiro. Data Compression and Database Performance. In Applied Computing, 1991.
- [24] P. J. Haas and L. Stokes. Estimating the Number of Classes in a Finite Population. J. Amer. Statist. Assoc., 93(444), 1998.
- [25] D. Harnik et al. Estimation of Deduplication Ratios in Large Data Sets. In MSST (Mass Storage Sys. Tech.), 2012.
- [26] D. Harnik et al. To Zip or not to Zip: Effective Resource Usage for Real-Time Compression. In FAST, 2013.
- [27] B. Huang et al. Cumulon: Optimizing Statistical Data Analysis in the Cloud. In SIGMOD, 2013.
- [28] B. Huang et al. Resource Elasticity for Large-Scale Machine Learning. In SIGMOD, 2015.
- [29] S. Idreos et al. Estimating the Compression Fraction of an Index using Sampling. In ICDE, 2010.
- [30] N. L. Johnson et al. Univariate Discrete Distributions. Wiley, New York, 2nd edition, 1992.
- [31] V. Karakasis et al. An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication. TPDS (Trans. Par. and Dist. Systems), 24(10), 2013.
- [32] D. Kernert et al. SLACID - Sparse Linear Algebra in a Column-Oriented In-Memory Database System. In SSDBM, 2014.
- [33] H. Kimura et al. Compression Aware Physical Database Design. PVLDB, 4(10), 2011.
- [34] K. Kourtis et al. Optimizing Sparse Matrix-Vector Multiplication Using Index and Value Compression. In CF (Computing Frontiers), 2008.
- [35] H. Lang et al. Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation. In SIGMOD, 2016.
- [36] P. Larson et al. SQL Server Column Store Indexes. In SIGMOD, 2011.
- [40] V. Raman and G. Swart. How to Wring a Table Dry: Entropy Compression of Relations and Querying of Compressed Relations. In VLDB, 2006.
- [41] V. Raman et al. DB2 with BLU Acceleration: So Much More than Just a Column Store. PVLDB, 6(11), 2013.
- [43] M. Stonebraker et al. C-Store: A Column-oriented DBMS. In VLDB, 2005.
- [44] M. Stonebraker et al. The Architecture of SciDB. In SSDBM, 2011.
- [46] G. Valiant and P. Valiant. Estimating the Unseen: An n/log(n)-sample Estimator for Entropy and Support Size, Shown Optimal via New CLTs. In STOC, 2011.
- [47] T. Westmann et al. The Implementation and Performance of Compressed Databases. SIGMOD Record, 29(3), 2000.
- [48] S. Williams et al. Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms. In SC (Supercomputing Conf.), 2007.
- [49] K. Wu et al. Optimizing Bitmap Indices With Efficient Compression. TODS, 31(1), 2006.
- [50] L. Yu et al. Exploiting Matrix Dependency for Efficient Distributed Matrix Computation. In SIGMOD, 2015.
- [51] M. Zaharia et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In NSDI, 2012.
- [52] C. Zhang et al. Materialization Optimizations for Feature Selection Workloads. In SIGMOD, 2014.

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn