## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Declarative recursive computation on an RDBMS: or, why you should use a database for distributed machine learning

Proceedings of the VLDB Endowment, no. 7 (2019): 822-835

EI

Full Text

Weibo

Keywords

Abstract

A number of popular systems, most notably Google's TensorFlow, have been implemented from the ground up to support machine learning tasks. We consider how to make a very small set of changes to a modern relational database management system (RDBMS) to make it suitable for distributed learning computations. Changes include adding better su...More

Code:

Data:

Introduction

- Modern machine learning (ML) platforms such as TensorFlow [10] have primarily been designed to support data parallelism, where a set of almost-identical computations are executed in parallel over a set of computational units.
- The only difference among the computations is that each operates over different training data.
- Data parallelism has its limits.
- Data parallelism implicitly assumes that the model being learned can fit in the RAM of a computational unit.
- This is not always a reasonable assumption, .
- A state-of-the-art NVIDIA Tesla V100 Tensor Core GPU (a $10,000 data center GPU) has 32GB of

Highlights

- Modern machine learning (ML) platforms such as TensorFlow [10] have primarily been designed to support data parallelism, where a set of almost-identical computations are executed in parallel over a set of computational units
- We have argued that a parallel/distributed relational database management system has promise as a backend for large scale machine learning computations
- We have considered unrolling recursive computations into a monolithic compute plan, which is broken into frames that are optimized and executed independently
- We have shown that when implemented on top of an relational database management system, these ideas result in machine learning computations that are model parallel— that is, able to handle large and complex models that need to be distributed across machines or compute units
- We have shown that model parallel, relational database management system-based machine learning computations scale well compared to TensorFlow, and that for Word2Vec and latent Dirichlet allocation, the relational database management system-based computations can be faster than TensorFlow
- The relational database management system was slower than TensorFlow for GPU-based implementations of neural networks,

Methods

- 7.1 Overview

the authors detail a set of experiments aimed at answering the following questions: Can the ideas described in this paper be used to re-purpose an RDBMS so that it can be used to implement scalable, performant, model parallel ML computations?

The authors implement the ideas in this paper on top of SimSQL, a researchprototype, distributed database system [18]. - SimSQL has a costbased optimizer, an assortment of implementations of the standard relational operations, the ability to pipeline those operations and make use of “interesting” physical data organizations.
- It has native matrix and vector support [39].
- LDA is interesting because it benefits the most from a model-parallel implementation

Results

- Using ten CPU machines, the authors run FFNN learning (40,000 hidden neurons, batch size 10,000), W2V learning (100-dimensional embedding) and LDA (1,000 topics), using three different cutting algorithms.
- The authors use the full solver, but rather than taking a probabilistic view of the problem (Section 6.3), the authors apply the idea of reducing the number of edges across frames, as these correspond to tables that must be materialized.
- The authors report the periteration running time of the various options in Figure 6

Conclusion

- To illustrate how the frames generated from the weight-optimized cutter differ from the min-cut version of the GQAP, the authors present Figure 13 which shows the set of frames obtained using these two options to cut an unrolling of a single iteration of FFNN learning
- In this graph, the authors show the relational operators that accept input into each frame and produce output from each frame.
- Though some of this discrepancy was due to the fact that the authors implemented the ideas on top of a research prototype, high-latency Java/Hadoop system, reducing that gap is an attractive target for future work

Related work

- Distributed learning systems. The parameter server architecture [49, 38] was proposed to provide scalable, parallel training for machine learning models. A parameter server consists of two components: a parameter server (or key-value store) and a set of workers who repeatedly access and update the model parameters.

DistBelief [26] is a framework that targets on training large, deep neural networks on a number of machines. It utilizes a parameterserver-like architecture, where the model parallelism is enabled by distributing the nodes of a neural network across different machines. While the efficacy of this architecture was tested on two optimization algorithms (Downpour SGD and Sandblaster L-BFGS), it is unclear precisely what support DistBelief provides for declarative or automated model parallelism; for example, the DistBelief paper did not describe how the matrix-matrix multiplication needed to compute activations is implemented if the two matrices are partitioned across a set of machines (as [26] implied).

Funding

- Work presented in this paper has been supported by the DARPA MUSE program, award No FA8750-142-0270 and by the NSF under grant Nos. 1355998 and 1409543

Reference

- Bigdl. https://bigdl-project.github.io/master/, 2017. Accessed Sep 1, 2018.
- Caffehttps://caffe2.ai, 2017. Accessed Sep 1, 2018.
- Chainerj. https://chainer.org/, 2017. Accessed Sep 1, 2018.
- Gluon. https://github.com/gluon-api/gluon-api, 2017. Accessed Sep 1, 2018.
- Introducing apache spark datasets. https://databricks.com/blog/2016/01/04/introducing-apache-spark-datasets.html, 2017. Accessed Sep 1, 2018.
- Keras. https://keras.io/, 2017. Accessed Sep 1, 2018.
- Pytorch. http://pytorch.org, 201Accessed Sep 1, 2018.
- Deeplearning4j. https://deeplearning4j.org/, 201Accessed Sep 1, 2018.
- M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mane, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viegas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv preprint arXiv:1603.04467, 2016.
- M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng. Tensorflow: A system for large-scale machine learning. In OSDI, pages 265–283, 2016.
- A. V. Aho and J. D. Ullman. The universality of data retrieval languages. In POPL, pages 110–119, 1979.
- N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. In STOC, pages 20–29, 1996.
- M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, and M. Zaharia. Spark sql: Relational data processing in spark. SIGMOD, pages 1383–1394, 2015.
- J. Bergstra, F. Bastien, O. Breuleux, P. Lamblin, R. Pascanu, O. Delalleau, G. Desjardins, D. Warde-Farley, I. J. Goodfellow, A. Bergeron, and Y. Bengio. Theano: Deep learning on gpus with python. In NIPS, 2011.
- L. S. Blackford, J. Choi, A. Cleary, E. D’Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, et al. ScaLAPACK users’ guide, volume 4. 1997.
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. In NIPS, 2003.
- R. Burkard, T. Bönniger, G. Katzakidis, and U. Derigs. Assignment and Matching Problems: Solution Methods with FORTRAN-Programs. Lecture Notes in Economics and Mathematical Systems. 2013.
- Z. Cai, Z. Vagena, L. Perez, S. Arumugam, P. J. Haas, and C. Jermaine. Simulation of database-valued markov chains using simsql. In SIGMOD, pages 637–648, 2013.
- P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, and K. Tzoumas. Apache flinkTM: Stream and batch processing in a single engine. IEEE Data Eng. Bull., 38:28–38, 2015.
- S. Chaudhuri. An overview of query optimization in relational systems. In PODS, pages 34–43, 1998.
- J. Chen, X. Pan, R. Monga, S. Bengio, and R. Jozefowicz. Revisiting distributed synchronous sgd. arXiv preprint arXiv:1604.00981, 2016.
- T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. arXiv preprint arXiv:1512.01274, 2015.
- T. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman. Project adam: Building an efficient and scalable deep learning training system. In OSDI, pages 571–582, 2014.
- A. Coates, B. Huval, T. Wang, D. J. Wu, B. Catanzaro, and A. Y. Ng. Deep learning with cots hpc systems. In ICML, 2013.
- R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: A matlab-like environment for machine learning. In NIPS, 2011.
- J. Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Y. Ng. Large scale distributed deep networks. In NIPS, pages 1223–1231, 2012.
- F. Färber, S. K. Cha, J. Primsch, C. Bornhövd, S. Sigg, and W. Lehner. Sap hana database: Data management for modern business applications. SIGMOD, 40(4):45–51, 2012.
- A. L. Gaunt, M. A. Johnson, M. Riechert, D. Tarlow, R. Tomioka, D. Vytiniotis, and S. Webster. AMPNet: Asynchronous Model-Parallel Training for Dynamic Neural Networks. arXiv preprint arXiv:1705.09786, 2017.
- A. Ghoting, R. Krishnamurthy, E. Pednault, B. Reinwald, V. Sindhwani, S. Tatikonda, Y. Tian, and S. Vaithyanathan. SystemML: Declarative machine learning on mapreduce. In ICDE, pages 231–242, 2011.
- P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He. Accurate, large minibatch sgd: training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
- W. D. Hillis and G. L. Steele, Jr. Data parallel algorithms. CACM, 29(12):1170–1183, 1986.
- K. Hornik, M. Stinchcombe, and H. White. Multilayer feedforward networks are universal approximators. Neural Netw, 2(5):359–366, 1989.
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In MM, pages 675–678, 2014.
- N. Kabra and D. J. DeWitt. Efficient mid-query re-optimization of sub-optimal query execution plans. volume 27, pages 106–117, 1998.
- A. Kemper and T. Neumann. Hyper: A hybrid oltp&olap main memory database system based on virtual memory snapshots. In ICDE, pages 195–206, 2011.
- A. Krizhevsky. One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997, 2014.
- C.-G. Lee and Z. Ma. The generalized quadratic assignment problem. 2004.
- M. Li, D. G. Andersen, J. W. Park, A. J. Smola, A. Ahmed, V. Josifovski, J. Long, E. J. Shekita, and B.-Y. Su. Scaling distributed machine learning with the parameter server. In OSDI, pages 583–598, 2014.
- S. Luo, Z. J. Gao, M. Gubanov, L. L. Perez, and C. Jermaine. Scalable linear algebra on a relational database system. In ICDE, pages 523–534, 2017.
- N. May, W. Lehner, S. H. P., N. Maheshwari, C. Müller, S. Chowdhuri, and A. K. Goel. SAP HANA - from relational OLAP database to big data infrastructure. In EDBT, pages 581–592, 2015.
- T. Mikolov, K. Chen, G. S. Corrado, and J. Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781, 2013.
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111–3119. 2013.
- G. Neubig, C. Dyer, Y. Goldberg, A. Matthews, W. Ammar, A. Anastasopoulos, M. Ballesteros, D. Chiang, D. Clothiaux, T. Cohn, K. Duh, M. Faruqui, C. Gan, D. Garrette, Y. Ji, L. Kong, A. Kuncoro, G. Kumar, C. Malaviya, P. Michel, Y. Oda, M. Richardson, N. Saphra, S. Swayamdipta, and P. Yin. DyNet: The Dynamic Neural Network Toolkit. arXiv preprint arXiv:1701.03980, 2017.
- T. Neumann, T. Mühlbauer, and A. Kemper. Fast serializable multi-version concurrency control for main-memory database systems. In SIGMOD, pages 677–689, 2015.
- L. Passing, M. Then, N. Hubig, H. Lang, M. Schreier, S. Günnemann, A. Kemper, and T. Neumann. SQL- and operator-centric data analytics in relational main-memory databases. In EDBT, pages 84–95, 2017.
- B. Recht, C. Re, S. Wright, and F. Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In NIPS, pages 693–701, 2011.
- S. Ruder. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747, 2016.
- N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. V. Le, G. E. Hinton, and J. Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. CoRR, abs/1701.06538, 2017.
- A. Smola and S. Narayanamurthy. An architecture for parallel topic models. PVLDB, 3(1-2):703–710, 2010.
- E. P. Xing, Q. Ho, W. Dai, J. K. Kim, J. Wei, S. Lee, X. Zheng, P. Xie, A. Kumar, and Y. Yu. Petuum: A new platform for distributed machine learning on big data. KDD, 1(2):49–67, 2015.
- D. Yu, A. Eversole, M. Seltzer, K. Yao, O. Kuchaiev, Y. Zhang, F. Seide, Z. Huang, B. Guenter, H. Wang, J. Droppo, G. Zweig, C. Rossbach, J. Gao, A. Stolcke, J. Currey, M. Slaney, G. Chen, A. Agarwal, C. Basoglu, M. Padmilac, A. Kamenev, V. Ivanov, S. Cypher, H. Parthasarathi, B. Mitra, B. Peng, and X. Huang. An introduction to computational networks and the computational network toolkit. Technical report, 2014.
- M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In HotCloud, pages 1–10, 2010.
- H. Zhang, Z. Hu, J. Wei, P. Xie, G. Kim, Q. Ho, and E. Xing. Poseidon: A system architecture for efficient gpu-based deep learning on multiple machines. arXiv preprint arXiv:1512.06216, 2015.

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn