AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We introduced the first core calculus, coreRMA, and its axiomatic semantics, to cleanly capture characteristics of Remote Memory Access programming

Modeling and analysis of remote memory access programming.

OOPSLA, pp.129-144, (2016)

Cited: 7|Views331
EI

Abstract

Recent advances in networking hardware have led to a new generation of Remote Memory Access (RMA) networks in which processors from different machines can communicate directly, bypassing the operating system and allowing higher performance. Researchers and practitioners have proposed libraries and programming models for RMA to enable the ...More

Code:

Data:

0
Introduction
  • Large-scale parallel systems are gaining importance for data center, big data, and scientific computations.
  • The traditional programming models for such systems are message passing and TCP/IP sockets
  • These models were designed for message-based interconnection networks such as Ethernet.
  • Remote Direct Memory Access (RDMA) network interfaces, which have been used in High-Performance Computing for years, offer higher performance at a comparable cost to Ethernet and are finding quick adoption in modern datacenters.
  • To extract the highest performance from such modern interconnects, programmers need to use Remote Memory Access (RMA) programming interfaces, which are replacing traditional message passing models.
  • RMA-capable hardware is in the same price range as standard Ethernet network cards while providing higher performance
Highlights
  • Large-scale parallel systems are gaining importance for data center, big data, and scientific computations
  • We show Remote Memory Access (RMA) behaviors that differ from sequentially consistent (SC) executions as well as behaviors not exhibited by other memory models studied in the literature such as TSO (Owens et al 2009), PSO, and RMO (SPARC International 1992)
  • The interesting behavior in this example is that, even if the ordering of accesses between the same two endpoints is guaranteed by the network, the value of d can be 1. This result is counterintuitive because it appears that the get and put statements before the flush are executed in reverse order, even if the two statements are accesses between the same two endpoints and the network guarantees order between such accesses. This type of behavior is allowed by the RMA networks, and our model enables users to reason about this behavior
  • We introduced the first core calculus, coreRMA, and its axiomatic semantics, to cleanly capture characteristics of Remote Memory Access (RMA) programming
  • We generated bounded-exhaustive test suites using constraint solvers based on our formal model and tested them on real networks
  • Our work serves as a basis for future work on reasoning about RMA programs and can help troubleshoot and design network implementations
Results
  • The authors describe an extensive experimental evaluation for validating the formal model against real-world networks.
  • The authors generated 7,441 tests.
  • Out of these tests, 3,654 have two interacting processes.
  • By generating tests with two processes, the authors exercise both the interaction between nodes and the intra-node interactions between the CPU and the NIC.
  • Generating tests with more than two processes would further stress test the interactions between nodes, which might reveal more interesting behaviours
Conclusion
  • The authors introduced the first core calculus, coreRMA, and its axiomatic semantics, to cleanly capture characteristics of Remote Memory Access (RMA) programming.
  • The authors generated bounded-exhaustive test suites using constraint solvers based on the formal model and tested them on real networks.
  • The authors' suites revealed actual behaviors which network experts did not expect and showed discrepancies between network behaviors and their documentation.
  • The authors' work serves as a basis for future work on reasoning about RMA programs and can help troubleshoot and design network implementations
Tables
  • Table1: coreRMA statements capture the essence of RMA programming. ∗ ∈ {a, n} represents atomicity of an access
  • Table2: coreRMA primitives and corresponding constructs in popular RMA languages
  • Table3: Translation scheme from statements into sets of actions, and corresponding attributes
  • Table4: Translations of paradigmatic statements into actions. Uses ordering relations −p→o and −h→b defined in Section 5.1. Atomicity information selectively omitted
  • Table5: Relations and functions that define an execution
  • Table6: Tests generated from stock coreRMA semantics identify 148 issues over 7,441 tests
Download tables as Excel
Related work
  • We discuss two kinds of related work: work on analyzing RMA-style programs (e.g., MPI) and general work in the analysis of weak memory models.

    Remote Memory Access (RMA) Programming. Specification of RMA libraries and language models is ongoing. The OFED low-level communication interface is currently undergoing a major reform (Hefty 2014). High-level RMA languages have also not stabilized. A new version of MPI RMA is under development. Proposed features such as remote notification (Belli and Hoefler 2015) interact intricately with the memory model.
Reference
  • P. A. Abdulla, M. F. Atig, Y. Chen, C. Leonardsson, and A. Rezine. Automatic fence insertion in integer programs via predicate abstraction. In Static Analysis - 19th International Symposium, SAS 2012, 2012.
    Google ScholarLocate open access versionFindings
  • J. Alglave, D. Kroening, V. Nimal, and M. Tautschnig. Software verification for weak memory via program transformation. In Programming Languages and Systems - 22nd European Symposium on Programming, ESOP 2013, 2013.
    Google ScholarLocate open access versionFindings
  • J. Alglave, D. Kroening, V. Nimal, and D. Poetzl. Don’t sit on the fence—A static analysis approach to automatic fence insertion. In Computer Aided Verification - 26th International Conference, CAV 2014, 2014a.
    Google ScholarLocate open access versionFindings
  • J. Alglave, L. Maranget, and M. Tautschnig. Herding cats: Modelling, simulation, testing, and data mining for weak memory. ACM Trans. Program. Lang. Syst., 36(2):7, 2014b. doi: 10.1145/2627752.
    Locate open access versionFindings
  • F. Allen, G. Almasi, W. Andreoni, D. Beece, B. J. Berne, A. Bright, J. Brunheroto, C. Cascaval, J. Castanos, P. Coteus, P. Crumley, A. Curioni, M. Denneau, W. Donath, M. Eleftheriou, B. Fitch, B. Fleischer, C. J. Georgiou, R. Germain, M. Giampapa, D. Gresh, M. Gupta, R. Haring, H. Ho, P. Hochschild, S. Hummel, T. Jonas, D. Lieber, G. Martyna, K. Maturu, J. Moreira, D. Newns, M. Newton, R. Philhower, T. Picunko, J. Pitera, M. Pitman, R. Rand, A. Royyuru, V. Salapura, A. Sanomiya, R. Shah, Y. Sham, S. Singh, M. Snir, F. Suits, R. Swetz, W. C. Swope, N. Vishnumurthy, T. J. C. Ward, H. Warren, and R. Zhou. Blue Gene: A vision for protein science using a petaflop supercomputer. IBM Syst. J., 40(2):310–327, Feb. 2001. ISSN 00188670. doi: 10.1147/sj.402.0310.
    Locate open access versionFindings
  • R. Alverson, D. Roweth, and L. Kaplan. The Gemini system interconnect. In Proc. of the IEEE Symposium on High Performance Interconnects (HOTI’10), pages 83–87. IEEE Computer Society, 2010.
    Google ScholarLocate open access versionFindings
  • B. Arimilli, R. Arimilli, V. Chung, S. Clark, W. Denzel, B. Drerup, T. Hoefler, J. Joyner, J. Lewis, J. Li, N. Ni, and R. Rajamony. The PERCS high-performance interconnect. In Proc. of the IEEE Symposium on High Performance Interconnects (HOTI’10), pages 75–82. IEEE Computer Society, Aug. 2010.
    Google ScholarLocate open access versionFindings
  • B. W. Barrett, R. B. Brightwell, K. T. T. Pedretti, K. B. Wheeler, K. S. Hemmert, R. E. Riesen, K. D. Underwood, A. B. Maccabe, and T. B. Hudson. The Portals 4.0 network programming interface. Technical report, Sandia National Laboratories, 2012. SAND2012-10087.
    Google ScholarFindings
  • R. Belli and T. Hoefler. Notified Access: Extending Remote Memory Access Programming Models for Producer-Consumer Synchronization. IEEE, May 2015. Accepted at IPDPS’15.
    Google ScholarLocate open access versionFindings
  • J. C. Blanchette, T. Weber, M. Batty, S. Owens, and S. Sarkar. Nitpicking C++ concurrency. In Proceedings of the 13th International ACM SIGPLAN Symposium on Principles and Practices of Declarative Programming, PPDP ’11, 2011.
    Google ScholarLocate open access versionFindings
  • A. Bouajjani, E. Derevenetc, and R. Meyer. Checking and enforcing robustness against TSO. In Programming Languages and Systems - 22nd European Symposium on Programming, ESOP 2013, 2013.
    Google ScholarLocate open access versionFindings
  • S. Burckhardt and M. Musuvathi. Effective program verification for relaxed memory models. In Computer Aided Verification, 20th International Conference, CAV 2008, 2008.
    Google ScholarLocate open access versionFindings
  • S. Burckhardt, R. Alur, and M. M. K. Martin. Checkfence: checking consistency of concurrent data types on relaxed memory models. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, 2007.
    Google ScholarLocate open access versionFindings
  • J. Burnim, K. Sen, and C. Stergiou. Sound and complete monitoring of sequential consistency for relaxed memory models. In Tools and Algorithms for the Construction and Analysis of Systems - 17th International Conference, TACAS 2011, 2011.
    Google ScholarLocate open access versionFindings
  • D. Chen, N. A. Eisley, P. Heidelberger, R. M. Senger, Y. Sugawara, S. Kumar, V. Salapura, D. L. Satterfield, B. SteinmacherBurow, and J. J. Parker. The IBM Blue Gene/q Interconnection Network and Message Unit. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’11, pages 26:1–26:10, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0771-0. doi: 10.1145/2063384.2063419.
    Locate open access versionFindings
  • Z. Chen, J. Dinan, Z. Tang, P. Balaji, H. Zhong, J. Wei, T. Huang, and F. Qin. Mc-checker: Detecting memory consistency errors in mpi one-sided applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’14, pages 499–510, Piscataway, NJ, USA, 2014. IEEE Press. ISBN 978-1-4799-5500-8.
    Google ScholarLocate open access versionFindings
  • Cray Inc. Using the GNI and DMAPP APIs. Ver. S-2446-52, March 2014. available at: http://docs.cray.com/ (Mar.2014).
    Findings
  • A. M. Dan, Y. Meshman, M. T. Vechev, and E. Yahav. Predicate abstraction for relaxed memory models. In F. Logozzo and M. Fahndrich, editors, Static Analysis - 20th International Symposium, SAS 2013, Seattle, WA, USA, June 20-22, 2013.
    Google ScholarLocate open access versionFindings
  • Proceedings, volume 7935 of Lecture Notes in Computer Science, pages 84–104.
    Google ScholarLocate open access versionFindings
  • Springer, 2013. ISBN 978-3-642-38855-2. doi: 10.1007/978-3-642-38856-9
    Findings
  • 7. A. M. Dan, Y. Meshman, M. T. Vechev, and E. Yahav. Effective abstractions for verification under relaxed memory models. In D. D’Souza, A. Lal, and K. G. Larsen, editors, Verification, Model Checking, and Abstract Interpretation - 16th International Conference, VMCAI 2015, Mumbai, India, January 1214, 2015. Proceedings, volume 8931 of Lecture Notes in Computer Science, pages 449–466.
    Google ScholarLocate open access versionFindings
  • Springer, 2015. ISBN 978-3-66246080-1. doi: 10.1007/978-3-662-46081-8
    Findings
  • 25. A. Dragojevic, D. Narayanan, M. Castro, and O. Hodson. Farm: Fast remote memory. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14), pages 401–414, Seattle, WA, Apr. 2014. USENIX Association. ISBN 978-1931971-09-6. D. Dunning, G. Regnier, G. McAlpine, D. Cameron, B. Shubert, F. Berry, A. M. Merritt, E. Gronke, and C. Dodd. The virtual interface architecture. IEEE micro, 18(2):66–76, 1998.
    Google ScholarLocate open access versionFindings
  • G. Faanes, A. Bataineh, D. Roweth, T. Court, E. Froese, B. Alverson, T. Johnson, J. Kopnick, M. Higgins, and J. Reinhard. Cray Cascade: A scalable HPC system based on a Dragonfly network. In Proc. of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’12), pages 103:1–103:9. IEEE Computer Society, 2012. ISBN 978-1-46730804-5.
    Google ScholarLocate open access versionFindings
  • R. Gerstenberger, M. Besta, and T. Hoefler. Enabling Highlyscalable Remote Memory Access Programming with MPI-3 One Sided. In Proc. of the ACM/IEEE Supercomputing, SC ’13, pages 53:1–53:12, 2013.
    Google ScholarLocate open access versionFindings
  • S. Hefty. Scalable fabric interfaces, 2014. OpenFabrics International Developer Workshop 2014.
    Google ScholarLocate open access versionFindings
  • T. Hoefler, J. Dinan, R. Thakur, B. Barrett, P. Balaji, W. Gropp, and K. Underwood. Remote Memory Access Programming in MPI-3. Argonne National Laboratory, Tech. Rep, 2013.
    Google ScholarLocate open access versionFindings
  • N. S. Islam, M. W. Rahman, J. Jose, R. Rajachandrasekar, H. Wang, H. Subramoni, C. Murthy, and D. K. Panda. High performance RDMA-based design of HDFS over InfiniBand. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’12, pages 35:1– 35:35, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press. ISBN 978-1-4673-0804-5.
    Google ScholarLocate open access versionFindings
  • D. Jackson. Software Abstractions: Logic, Language, and Analysis. The MIT Press, 2006. ISBN 0262101149.
    Google ScholarFindings
  • N. Jiang, J. Kim, and W. J. Dally. Indirect adaptive routing on large scale interconnection networks. SIGARCH Comput. Archit. News, 37(3):220–231, June 2009. ISSN 0163-5964.
    Google ScholarLocate open access versionFindings
  • S. Kumar, A. Mamidala, D. A. Faraj, B. Smith, M. Blocksome, B. Cernohous, D. Miller, J. Parker, J. Ratterman, P. Heidelberger, D. Chen, and B. D. Steinmacher-Burrow. PAMI: A parallel active message interface for the Blue Gene/Q supercomputer. In Proc. of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’12), pages 763–773. IEEE Computer Society, 2012.
    Google ScholarLocate open access versionFindings
  • M. Kuperstein, M. T. Vechev, and E. Yahav. Automatic inference of memory fences. In Proceedings of 10th International Conference on Formal Methods in Computer-Aided Design, FMCAD 2010, 2010.
    Google ScholarLocate open access versionFindings
  • M. Kuperstein, M. T. Vechev, and E. Yahav. Partial-coherence abstractions for relaxed memory models. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, 2011.
    Google ScholarLocate open access versionFindings
  • G. Li, R. Palmer, M. DeLisi, G. Gopalakrishnan, and R. M. Kirby. Formal specification of MPI 2.0: Case study in specifying a practical concurrent programming API. Sci. Comput. Program., 76(2):65–81, Feb. 2011. ISSN 0167-6423.
    Google ScholarLocate open access versionFindings
  • A. Linden and P. Wolper. A verification-based approach to memory fence insertion in PSO memory systems. In Tools and Algorithms for the Construction and Analysis of Systems - 19th International Conference, TACAS 2013, 2013.
    Google ScholarLocate open access versionFindings
  • F. Liu, N. Nedev, N. Prisadnikov, M. T. Vechev, and E. Yahav. Dynamic synthesis for relaxed memory models. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, 2012.
    Google ScholarLocate open access versionFindings
  • Y. Meshman, A. M. Dan, M. T. Vechev, and E. Yahav. Synthesis of memory fences via refinement propagation. In Static Analysis 21st International Symposium, SAS 2014, 2014.
    Google ScholarLocate open access versionFindings
  • B. Norris and B. Demsky. CDSchecker: checking concurrent data structures written with C/C++ atomics. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2013, 2013. R. W. Numrich and J. Reid. Co-array Fortran for parallel programming. SIGPLAN Fortran Forum, 17(2):1–31, 1998.
    Google ScholarLocate open access versionFindings
  • OpenFabrics Alliance (OFA). OpenFabrics Enterprise Distribution (OFED) www.openfabrics.org, 2014.
    Findings
  • S. Owens, S. Sarkar, and P. Sewell. A better x86 memory model: x86-TSO. In Theorem Proving in Higher Order Logics, 22nd International Conference, TPHOLs 2009, Munich, Germany, August 17-20, 2009.
    Google ScholarLocate open access versionFindings
  • Proceedings, 2009.
    Google ScholarLocate open access versionFindings
  • C.-S. Park, K. Sen, P. Hargrove, and C. Iancu. Efficient data race detection for distributed memory parallel programs. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’11, pages 51:1–51:12, New York, NY, USA, 2011. ACM. ISBN 978-14503-0771-0.
    Google ScholarLocate open access versionFindings
  • C. S. Park, K. Sen, and C. Iancu. Scaling data race detection for partitioned global address space programs. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS ’13, pages 47–58, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-2130-3. doi: 10.1145/2464996.2465000.
    Locate open access versionFindings
  • M. Poke and T. Hoefler. Dare: High-performance state machine replication on rdma networks. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, HPDC ’15, pages 107–118, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3550-8. doi: 10.1145/ 2749246.2749267.
    Locate open access versionFindings
  • V. Saraswat, G. Almasi, G. Bikshandi, C. Cascaval, D. Cunningham, D. Grove, S. Kodali, I. Peshansky, and O. Tardieu. The asynchronous partitioned global address space model. In AMP ’10: Proceedings of The First Workshop on Advances in Message Passing, June 2010.
    Google ScholarLocate open access versionFindings
  • S. Sarkar, P. Sewell, J. Alglave, L. Maranget, and D. Williams. Understanding POWER multiprocessors. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, 2011.
    Google ScholarLocate open access versionFindings
  • C. SPARC International, Inc. The SPARC Architecture Manual: Version 8. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1992. ISBN 0-13-825001-4.
    Google ScholarFindings
  • The InfiniBand Trade Association. Infiniband Architecture Spec. Vol. 1, Rel. 1.2. InfiniBand Trade Association, 2004.
    Google ScholarFindings
  • E. Torlak, M. Vaziri, and J. Dolby. Memsat: checking axiomatic specifications of memory models. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI, 2010.
    Google ScholarLocate open access versionFindings
  • UPC Consortium. UPC language specifications, v1.2. Technical report, Lawrence Berkeley National Laboratory, 2005. LBNL59208.
    Google ScholarFindings
  • M. Valiev, E. J. Bylaska, N. Govind, K. Kowalski, T. P. Straatsma, H. J. Van Dam, D. Wang, J. Nieplocha, E. Apra, T. L. Windus, et al. NWChem: a comprehensive and scalable open-source solution for large scale molecular simulations. Computer Physics Communications, 181(9):1477–1489, 2010.
    Google ScholarLocate open access versionFindings
0
Your rating :

No Ratings

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn