AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
PyTorch has become a popular tool in the deep learning research community by combining a focus on usability with careful performance considerations

PyTorch - An Imperative Style, High-Performance Deep Learning Library.

NeurIPS, pp.8024-8035, (2019)

Cited by: 10293|Views593
EI
Full Text
Bibtex
Weibo

Abstract

Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific...More

Code:

Data:

0
Introduction
  • With the increased interest in deep learning in recent years, there has been an explosion of machine learning tools.
  • Starting in the 1960s, the development of domain specific languages such as APL [8], MATLAB [9], R [10] and Julia [11], turned multidimensional arrays into first-class objects supported by a comprehensive set of mathematical primitives to manipulate them
  • Libraries such as NumPy[12], Torch[6], Eigen[13] and Lush[14] made array-based programming productive in general purpose languages such as Python, Lisp, C++ and Lua. Second, the development of automatic differentiation [15] made it possible to fully automate the daunting labor of computing derivatives.
  • The autograd [16] package popularized the use of this technique for NumPy arrays, and similar approaches are used in frameworks such as Chainer [5], DyNet [7], Lush [14], Torch [6], Jax [17] and Flux.jl [18]
Highlights
  • With the increased interest in deep learning in recent years, there has been an explosion of machine learning tools
  • Prior work has recognized the value of dynamic eager execution for deep learning, and some recent frameworks implement this define-by-run approach, but do so either at the cost of performance (Chainer [5]) or using a less expressive, faster language (Torch [6], DyNet [7]), which limits their applicability
  • This paper introduces PyTorch, a Python library that performs immediate execution of dynamic tensor computations with automatic differentiation and GPU acceleration, and does so while maintaining performance comparable to the fastest current libraries for deep learning
  • Four major trends in scientific computing have become increasingly important for deep learning
  • The development of automatic differentiation [15] made it possible to fully automate the daunting labor of computing derivatives. This made it significantly easier to experiment with different machine learning approaches while still allowing for efficient gradient based optimization
  • PyTorch has become a popular tool in the deep learning research community by combining a focus on usability with careful performance considerations
Methods
  • Design principles

    PyTorch’s success stems from weaving previous ideas into a design that balances speed and ease of use.
  • PyTorch should be a first-class member of that ecosystem
  • It follows the commonly established design goals of keeping interfaces simple and consistent, ideally with one idiomatic way of doing things.
  • It integrates naturally with standard plotting, debugging, and data processing tools.
  • The complexity inherent to machine learning should be handled internally by the PyTorch library and hidden behind intuitive APIs free of side-effects and unexpected performance cliffs.
  • Trading 10% of speed for a significantly simpler to use model is acceptable; 100% is not.
  • Its implementation accepts added complexity in order to deliver that performance.
  • It is better to have a simple but slightly incomplete solution than a comprehensive but complex and hard to maintain design
Results
  • The authors compare the performance of PyTorch with several other commonly-used deep learning libraries, and find that it achieves competitive performance across a range of tasks.
  • The authors start by quantifying the ability of PyTorch to asynchronously execute dataflow on GPU.
  • The host CPU which queues the work quickly outpaces the execution of the operators on the GPU.
  • This allows PyTorch to achieve almost perfect device utilization.
  • In this example, GPU execution takes around three times longer than CPU scheduling.
  • The exact ratio depends on the relative performance of the host CPU and the GPU, as well as the number of elements in each tensor and the average arithmetic complexity of the floating point computations to be performed on the GPU
Conclusion
  • PyTorch has become a popular tool in the deep learning research community by combining a focus on usability with careful performance considerations.
  • In addition to continuing to support the latest trends and advances in deep learning, in the future the authors plan to continue to improve the speed and scalability of PyTorch.
  • The authors are working on the PyTorch JIT: a suite of tools that allow PyTorch programs to be executed outside of the Python interpreter where they can be further optimized.
  • The authors intend to improve support for distributed computation by providing efficient primitives for data parallelism as well as a Pythonic library for model parallelism based around remote procedure calls
Tables
  • Table1: Training speed for 6 models using 32bit floats. Throughput is measured in images per second for the AlexNet, VGG-19, ResNet-50, and MobileNet models, in tokens per second for the GNMTv2 model, and in samples per second for the NCF model. The fastest speed for each model is shown in bold
Download tables as Excel
Funding
  • Trading 10% of speed for a significantly simpler to use model is acceptable; 100% is not
  • On all the benchmarks, the performance of PyTorch is within 17% of that of of the fastest framework
Reference
  • Yangqing "Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor" Darrell. "caffe: Convolutional architecture for fast feature embedding". "arXiv preprint arXiv:1408.5093", "2014".
    Findings
  • Frank Seide and Amit Agarwal. Cntk: Microsoft’s open-source deep-learning toolkit. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 2135–2135, New York, NY, USA, 2016. ACM.
    Google ScholarLocate open access versionFindings
  • Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Largescale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.
    Google ScholarFindings
  • Theano Development Team. Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints, abs/1605.02688, May 2016.
    Findings
  • Seiya Tokui, Kenta Oono, Shohei Hido, and Justin Clayton. Chainer: a next-generation open source framework for deep learning. In Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Twenty-ninth Annual Conference on Neural Information Processing Systems (NIPS), 2015.
    Google ScholarLocate open access versionFindings
  • Ronan Collobert, Samy Bengio, and Johnny Mariéthoz. Torch: a modular machine learning software library. Technical report, Idiap, 2002.
    Google ScholarFindings
  • G. Neubig, C. Dyer, Y. Goldberg, A. Matthews, W. Ammar, A. Anastasopoulos, M. Ballesteros, D. Chiang, D. Clothiaux, T. Cohn, K. Duh, M. Faruqui, C. Gan, D. Garrette, Y. Ji, L. Kong, A. Kuncoro, G. Kumar, C. Malaviya, P. Michel, Y. Oda, M. Richardson, N. Saphra, S. Swayamdipta, and P. Yin. DyNet: The Dynamic Neural Network Toolkit. ArXiv e-prints, January 2017.
    Google ScholarLocate open access versionFindings
  • Philip S. Abrams. An APL Machine. PhD thesis, Stanford University, 1970.
    Google ScholarFindings
  • The MathWorks, Inc., Natick, Massachusetts, United States. MATLAB and Statistics Toolbox.
    Google ScholarFindings
  • R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
    Google ScholarLocate open access versionFindings
  • Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B Shah. Julia: A fresh approach to numerical computing. SIAM review, 59(1):65–98, 2017.
    Google ScholarLocate open access versionFindings
  • Travis Oliphant. NumPy: A guide to NumPy. USA: Trelgol Publishing, 2006. http://www.numpy.org/.
    Findings
  • Gaël Guennebaud, Benoît Jacob, et al. Eigen v3. http://eigen.tuxfamily.org, 2010.
    Findings
  • Y LeCun and L Bottou. Lush reference manual. Technical report, code available at http://lush.sourceforge.net, 2002.
    Findings
  • Atilim Gunes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. Automatic differentiation in machine learning: A survey. J. Mach. Learn. Res., 18(1):5595–5637, January 2017.
    Google ScholarLocate open access versionFindings
  • Dougal Maclaurin. Modeling, Inference and Optimization with Composable Differentiable Procedures. PhD thesis, Harvard University, April 2016.
    Google ScholarFindings
  • Matthew Johnson et al. Jax. https://github.com/google/jax, 2018.
    Findings
  • Mike Innes et al. Flux.jl. https://github.com/FluxML/Flux.jl, 2018.
    Findings
  • Eric Jones, Travis Oliphant, Pearu Peterson, et al. SciPy: Open source scientific tools for Python, 2001–. http://www.scipy.org/.
    Locate open access versionFindings
  • Wes McKinney. Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference, 51-56, 2010.
    Google ScholarLocate open access versionFindings
  • Pierre Sermanet, Koray Kavukcuoglu, and Yann LeCun. Eblearn: Open-source energy-based learning in c++. In 2009 21st IEEE International Conference on Tools with Artificial Intelligence, pages 693–697. IEEE, 2009.
    Google ScholarLocate open access versionFindings
  • Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan D. Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. cudnn: Efficient primitives for deep learning. CoRR, abs/1410.0759, 2014.
    Findings
  • Andrew Lavin. maxdnn: An efficient convolution kernel for deep learning with maxwell gpus, January 2015.
    Google ScholarFindings
  • Andrew Lavin and Scott Gray. Fast algorithms for convolutional neural networks. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4013–4021, 2016.
    Google ScholarLocate open access versionFindings
  • Ronan Collobert, Koray Kavukcuoglu, and Clément Farabet. Torch7: A matlab-like environment for machine learning. In NIPS 2011, 2011.
    Google ScholarLocate open access versionFindings
  • Richard Gabriel. The rise of worse is better. http://dreamsongs.com/RiseOfWorseIsBetter.html.
    Findings
  • Yann LeCun and Corinna Cortes. http://yann.lecun.com/exdb/mnist/.
    Findings
  • Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, Michelle Yeo, Alireza Makhzani, Heinrich Küttler, John Agapiou, Julian Schrittwieser, John Quan, Stephen Gaffney, Stig Petersen, Karen Simonyan, Tom Schaul, Hado van Hasselt, David Silver, Timothy P. Lillicrap, Kevin Calderone, Paul Keet, Anthony Brunasso, David Lawrence, Anders Ekermo, Jacob Repp, and Rodney Tsing. Starcraft II: A new challenge for reinforcement learning. CoRR, abs/1708.04782, 2017.
    Findings
  • DMLC. Dlpack: Open in memory tensor structure. https://github.com/dmlc/dlpack.
    Findings
  • Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. In NIPS Workshop, 2017.
    Google ScholarLocate open access versionFindings
  • Dan Piponi. Automatic differentiation, C++ templates, and photogrammetry. J. Graphics, GPU, & Game Tools, 9(4):41–55, 2004.
    Google ScholarLocate open access versionFindings
  • Holger Leuck and Hans-Hellmut Nagel. Automatic differentiation facilitates of-integration into steering-angle-based road vehicle tracking. In 1999 Conference on Computer Vision and Pattern Recognition (CVPR ’99), 23-25 June 1999, Ft. Collins, CO, USA, pages 2360–2365, 1999.
    Google ScholarLocate open access versionFindings
  • The Python team. https://wiki.python.org/moin/GlobalInterpreterLock.
    Findings
  • Giovanni Petrantoni and Jörg Wollenschläger. Nimtorch. https://github.com/fragcolorxyz/nimtorch.
    Findings
  • Austin Huang, Junji Hashimoto, and Sam Stites. https://github.com/hasktorch/hasktorch.
    Findings
  • G. Synnaeve, Z. Lin, J. Gehring, D. Gant, V. Mella, V. Khalidov, N. Carion, and N. Usunier. Forward modeling for partial observation strategy games - a starcraft defogger. In Advances in Neural Information Processing Systems, pages 10761–10771, 2018.
    Google ScholarLocate open access versionFindings
  • The PyTorch team. Torch Script. https://pytorch.org/docs/stable/jit.html.
    Findings
  • Justin Luitjens. Cuda streams. GPU technology conference, 2014.
    Google ScholarFindings
  • Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. Hoard: A scalable memory allocator for multithreaded applications. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS IX, pages 117–128, New York, NY, USA, 2000. ACM.
    Google ScholarLocate open access versionFindings
  • J. Evans. A scalable concurrent malloc(3) implementation for freebsd. In In BSDCan — The Technical BSD Conference, May 2006.
    Google ScholarLocate open access versionFindings
  • S. Ghemawat and P. Menage. Tcmalloc: Thread-caching malloc.
    Google ScholarFindings
  • Benjamin Recht, Christopher Ré, Stephen J. Wright, and Feng Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, Granada, Spain., pages 693–701, 2011.
    Google ScholarLocate open access versionFindings
  • Matthew Hertz and Emery D. Berger. Quantifying the performance of garbage collection vs. explicit memory management. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA ’05, pages 313–326, New York, NY, USA, 2005. ACM.
    Google ScholarLocate open access versionFindings
  • The PyTorch team. Pytorch Autograd Profiler. https://pytorch.org/docs/1.0.1/autograd.html#profiler.
    Findings
Author
Francisco Massa
Francisco Massa
Gregory Chanan
Gregory Chanan
Trevor Killeen
Trevor Killeen
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科