Learning from Optimizing Matrix-Matrix Multiplication

2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)(2018)

引用 3|浏览17
暂无评分
摘要
We describe a learning process that uses one of the simplest examples, matrix-matrix multiplication, to illustrate issues that underlie parallel high-performance computing. It is accessible at multiple levels: simple enough to use early in a curriculum yet rich enough to benefit a more advanced software developer. A carefully designed and scaffolded set of exercises leads the learner from a naive implementation towards one that extracts parallelism at multiple levels, ranging from instruction level parallelism to multithreaded parallelism via OpenMP to distributed memory parallelism using MPI. The importance of effectively leveraging the memory hierarchy within and across nodes is exposed, as do the GotoBLAS and SUMMA algorithms. These materials will become part of a Massive Open Online Course (MOOC) to be offered in the future.
更多
查看译文
关键词
parallel computing,high performance computing,matrix-matrix multiplication,computing education,open education
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要