AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We have shown that when implemented on top of an relational database management system, these ideas result in machine learning computations that are model parallel— that is, able to handle large and complex models that need to be distributed across machines or compute units

Declarative recursive computation on an RDBMS: or, why you should use a database for distributed machine learning

Proceedings of the VLDB Endowment, no. 7 (2019): 822-835

Cited by: 19|Views529
EI

Abstract

A number of popular systems, most notably Google's TensorFlow, have been implemented from the ground up to support machine learning tasks. We consider how to make a very small set of changes to a modern relational database management system (RDBMS) to make it suitable for distributed learning computations. Changes include adding better su...More

Code:

Data:

0
Introduction
  • Modern machine learning (ML) platforms such as TensorFlow [10] have primarily been designed to support data parallelism, where a set of almost-identical computations are executed in parallel over a set of computational units.
  • The only difference among the computations is that each operates over different training data.
  • Data parallelism has its limits.
  • Data parallelism implicitly assumes that the model being learned can fit in the RAM of a computational unit.
  • This is not always a reasonable assumption, .
  • A state-of-the-art NVIDIA Tesla V100 Tensor Core GPU (a $10,000 data center GPU) has 32GB of
Highlights
  • Modern machine learning (ML) platforms such as TensorFlow [10] have primarily been designed to support data parallelism, where a set of almost-identical computations are executed in parallel over a set of computational units
  • We have argued that a parallel/distributed relational database management system has promise as a backend for large scale machine learning computations
  • We have considered unrolling recursive computations into a monolithic compute plan, which is broken into frames that are optimized and executed independently
  • We have shown that when implemented on top of an relational database management system, these ideas result in machine learning computations that are model parallel— that is, able to handle large and complex models that need to be distributed across machines or compute units
  • We have shown that model parallel, relational database management system-based machine learning computations scale well compared to TensorFlow, and that for Word2Vec and latent Dirichlet allocation, the relational database management system-based computations can be faster than TensorFlow
  • The relational database management system was slower than TensorFlow for GPU-based implementations of neural networks,
Methods
  • 7.1 Overview

    the authors detail a set of experiments aimed at answering the following questions: Can the ideas described in this paper be used to re-purpose an RDBMS so that it can be used to implement scalable, performant, model parallel ML computations?

    The authors implement the ideas in this paper on top of SimSQL, a researchprototype, distributed database system [18].
  • SimSQL has a costbased optimizer, an assortment of implementations of the standard relational operations, the ability to pipeline those operations and make use of “interesting” physical data organizations.
  • It has native matrix and vector support [39].
  • LDA is interesting because it benefits the most from a model-parallel implementation
Results
  • Using ten CPU machines, the authors run FFNN learning (40,000 hidden neurons, batch size 10,000), W2V learning (100-dimensional embedding) and LDA (1,000 topics), using three different cutting algorithms.
  • The authors use the full solver, but rather than taking a probabilistic view of the problem (Section 6.3), the authors apply the idea of reducing the number of edges across frames, as these correspond to tables that must be materialized.
  • The authors report the periteration running time of the various options in Figure 6
Conclusion
  • To illustrate how the frames generated from the weight-optimized cutter differ from the min-cut version of the GQAP, the authors present Figure 13 which shows the set of frames obtained using these two options to cut an unrolling of a single iteration of FFNN learning
  • In this graph, the authors show the relational operators that accept input into each frame and produce output from each frame.
  • Though some of this discrepancy was due to the fact that the authors implemented the ideas on top of a research prototype, high-latency Java/Hadoop system, reducing that gap is an attractive target for future work
Related work
  • Distributed learning systems. The parameter server architecture [49, 38] was proposed to provide scalable, parallel training for machine learning models. A parameter server consists of two components: a parameter server (or key-value store) and a set of workers who repeatedly access and update the model parameters.

    DistBelief [26] is a framework that targets on training large, deep neural networks on a number of machines. It utilizes a parameterserver-like architecture, where the model parallelism is enabled by distributing the nodes of a neural network across different machines. While the efficacy of this architecture was tested on two optimization algorithms (Downpour SGD and Sandblaster L-BFGS), it is unclear precisely what support DistBelief provides for declarative or automated model parallelism; for example, the DistBelief paper did not describe how the matrix-matrix multiplication needed to compute activations is implemented if the two matrices are partitioned across a set of machines (as [26] implied).
Funding
  • Work presented in this paper has been supported by the DARPA MUSE program, award No FA8750-142-0270 and by the NSF under grant Nos. 1355998 and 1409543
Reference
  • Bigdl. https://bigdl-project.github.io/master/, 2017. Accessed Sep 1, 2018.
    Findings
  • Caffehttps://caffe2.ai, 2017. Accessed Sep 1, 2018.
    Findings
  • Chainerj. https://chainer.org/, 2017. Accessed Sep 1, 2018.
    Findings
  • Gluon. https://github.com/gluon-api/gluon-api, 2017. Accessed Sep 1, 2018.
    Findings
  • Introducing apache spark datasets. https://databricks.com/blog/2016/01/04/introducing-apache-spark-datasets.html, 2017. Accessed Sep 1, 2018.
    Findings
  • Keras. https://keras.io/, 2017. Accessed Sep 1, 2018.
    Findings
  • Pytorch. http://pytorch.org, 201Accessed Sep 1, 2018.
    Findings
  • Deeplearning4j. https://deeplearning4j.org/, 201Accessed Sep 1, 2018.
    Findings
  • M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mane, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viegas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv preprint arXiv:1603.04467, 2016.
    Findings
  • M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng. Tensorflow: A system for large-scale machine learning. In OSDI, pages 265–283, 2016.
    Google ScholarLocate open access versionFindings
  • A. V. Aho and J. D. Ullman. The universality of data retrieval languages. In POPL, pages 110–119, 1979.
    Google ScholarLocate open access versionFindings
  • N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. In STOC, pages 20–29, 1996.
    Google ScholarLocate open access versionFindings
  • M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, and M. Zaharia. Spark sql: Relational data processing in spark. SIGMOD, pages 1383–1394, 2015.
    Google ScholarLocate open access versionFindings
  • J. Bergstra, F. Bastien, O. Breuleux, P. Lamblin, R. Pascanu, O. Delalleau, G. Desjardins, D. Warde-Farley, I. J. Goodfellow, A. Bergeron, and Y. Bengio. Theano: Deep learning on gpus with python. In NIPS, 2011.
    Google ScholarLocate open access versionFindings
  • L. S. Blackford, J. Choi, A. Cleary, E. D’Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, et al. ScaLAPACK users’ guide, volume 4. 1997.
    Google ScholarLocate open access versionFindings
  • D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. In NIPS, 2003.
    Google ScholarFindings
  • R. Burkard, T. Bönniger, G. Katzakidis, and U. Derigs. Assignment and Matching Problems: Solution Methods with FORTRAN-Programs. Lecture Notes in Economics and Mathematical Systems. 2013.
    Google ScholarLocate open access versionFindings
  • Z. Cai, Z. Vagena, L. Perez, S. Arumugam, P. J. Haas, and C. Jermaine. Simulation of database-valued markov chains using simsql. In SIGMOD, pages 637–648, 2013.
    Google ScholarLocate open access versionFindings
  • P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, and K. Tzoumas. Apache flinkTM: Stream and batch processing in a single engine. IEEE Data Eng. Bull., 38:28–38, 2015.
    Google ScholarLocate open access versionFindings
  • S. Chaudhuri. An overview of query optimization in relational systems. In PODS, pages 34–43, 1998.
    Google ScholarLocate open access versionFindings
  • J. Chen, X. Pan, R. Monga, S. Bengio, and R. Jozefowicz. Revisiting distributed synchronous sgd. arXiv preprint arXiv:1604.00981, 2016.
    Findings
  • T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. arXiv preprint arXiv:1512.01274, 2015.
    Findings
  • T. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman. Project adam: Building an efficient and scalable deep learning training system. In OSDI, pages 571–582, 2014.
    Google ScholarLocate open access versionFindings
  • A. Coates, B. Huval, T. Wang, D. J. Wu, B. Catanzaro, and A. Y. Ng. Deep learning with cots hpc systems. In ICML, 2013.
    Google ScholarLocate open access versionFindings
  • R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: A matlab-like environment for machine learning. In NIPS, 2011.
    Google ScholarLocate open access versionFindings
  • J. Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Y. Ng. Large scale distributed deep networks. In NIPS, pages 1223–1231, 2012.
    Google ScholarLocate open access versionFindings
  • F. Färber, S. K. Cha, J. Primsch, C. Bornhövd, S. Sigg, and W. Lehner. Sap hana database: Data management for modern business applications. SIGMOD, 40(4):45–51, 2012.
    Google ScholarLocate open access versionFindings
  • A. L. Gaunt, M. A. Johnson, M. Riechert, D. Tarlow, R. Tomioka, D. Vytiniotis, and S. Webster. AMPNet: Asynchronous Model-Parallel Training for Dynamic Neural Networks. arXiv preprint arXiv:1705.09786, 2017.
    Findings
  • A. Ghoting, R. Krishnamurthy, E. Pednault, B. Reinwald, V. Sindhwani, S. Tatikonda, Y. Tian, and S. Vaithyanathan. SystemML: Declarative machine learning on mapreduce. In ICDE, pages 231–242, 2011.
    Google ScholarLocate open access versionFindings
  • P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He. Accurate, large minibatch sgd: training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
    Findings
  • W. D. Hillis and G. L. Steele, Jr. Data parallel algorithms. CACM, 29(12):1170–1183, 1986.
    Google ScholarLocate open access versionFindings
  • K. Hornik, M. Stinchcombe, and H. White. Multilayer feedforward networks are universal approximators. Neural Netw, 2(5):359–366, 1989.
    Google ScholarLocate open access versionFindings
  • Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In MM, pages 675–678, 2014.
    Google ScholarLocate open access versionFindings
  • N. Kabra and D. J. DeWitt. Efficient mid-query re-optimization of sub-optimal query execution plans. volume 27, pages 106–117, 1998.
    Google ScholarLocate open access versionFindings
  • A. Kemper and T. Neumann. Hyper: A hybrid oltp&olap main memory database system based on virtual memory snapshots. In ICDE, pages 195–206, 2011.
    Google ScholarLocate open access versionFindings
  • A. Krizhevsky. One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997, 2014.
    Findings
  • C.-G. Lee and Z. Ma. The generalized quadratic assignment problem. 2004.
    Google ScholarFindings
  • M. Li, D. G. Andersen, J. W. Park, A. J. Smola, A. Ahmed, V. Josifovski, J. Long, E. J. Shekita, and B.-Y. Su. Scaling distributed machine learning with the parameter server. In OSDI, pages 583–598, 2014.
    Google ScholarLocate open access versionFindings
  • S. Luo, Z. J. Gao, M. Gubanov, L. L. Perez, and C. Jermaine. Scalable linear algebra on a relational database system. In ICDE, pages 523–534, 2017.
    Google ScholarLocate open access versionFindings
  • N. May, W. Lehner, S. H. P., N. Maheshwari, C. Müller, S. Chowdhuri, and A. K. Goel. SAP HANA - from relational OLAP database to big data infrastructure. In EDBT, pages 581–592, 2015.
    Google ScholarLocate open access versionFindings
  • T. Mikolov, K. Chen, G. S. Corrado, and J. Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781, 2013.
    Findings
  • T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111–3119. 2013.
    Google ScholarLocate open access versionFindings
  • G. Neubig, C. Dyer, Y. Goldberg, A. Matthews, W. Ammar, A. Anastasopoulos, M. Ballesteros, D. Chiang, D. Clothiaux, T. Cohn, K. Duh, M. Faruqui, C. Gan, D. Garrette, Y. Ji, L. Kong, A. Kuncoro, G. Kumar, C. Malaviya, P. Michel, Y. Oda, M. Richardson, N. Saphra, S. Swayamdipta, and P. Yin. DyNet: The Dynamic Neural Network Toolkit. arXiv preprint arXiv:1701.03980, 2017.
    Findings
  • T. Neumann, T. Mühlbauer, and A. Kemper. Fast serializable multi-version concurrency control for main-memory database systems. In SIGMOD, pages 677–689, 2015.
    Google ScholarLocate open access versionFindings
  • L. Passing, M. Then, N. Hubig, H. Lang, M. Schreier, S. Günnemann, A. Kemper, and T. Neumann. SQL- and operator-centric data analytics in relational main-memory databases. In EDBT, pages 84–95, 2017.
    Google ScholarLocate open access versionFindings
  • B. Recht, C. Re, S. Wright, and F. Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In NIPS, pages 693–701, 2011.
    Google ScholarLocate open access versionFindings
  • S. Ruder. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747, 2016.
    Findings
  • N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. V. Le, G. E. Hinton, and J. Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. CoRR, abs/1701.06538, 2017.
    Findings
  • A. Smola and S. Narayanamurthy. An architecture for parallel topic models. PVLDB, 3(1-2):703–710, 2010.
    Google ScholarLocate open access versionFindings
  • E. P. Xing, Q. Ho, W. Dai, J. K. Kim, J. Wei, S. Lee, X. Zheng, P. Xie, A. Kumar, and Y. Yu. Petuum: A new platform for distributed machine learning on big data. KDD, 1(2):49–67, 2015.
    Google ScholarLocate open access versionFindings
  • D. Yu, A. Eversole, M. Seltzer, K. Yao, O. Kuchaiev, Y. Zhang, F. Seide, Z. Huang, B. Guenter, H. Wang, J. Droppo, G. Zweig, C. Rossbach, J. Gao, A. Stolcke, J. Currey, M. Slaney, G. Chen, A. Agarwal, C. Basoglu, M. Padmilac, A. Kamenev, V. Ivanov, S. Cypher, H. Parthasarathi, B. Mitra, B. Peng, and X. Huang. An introduction to computational networks and the computational network toolkit. Technical report, 2014.
    Google ScholarFindings
  • M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In HotCloud, pages 1–10, 2010.
    Google ScholarLocate open access versionFindings
  • H. Zhang, Z. Hu, J. Wei, P. Xie, G. Kim, Q. Ho, and E. Xing. Poseidon: A system architecture for efficient gpu-based deep learning on multiple machines. arXiv preprint arXiv:1512.06216, 2015.
    Findings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科