Fine-grain task-parallel algorithms for matrix factorizations and inversion on many-threaded CPUs

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE(2022)

引用 0|浏览6
暂无评分
摘要
We extend a two-level task partitioning previously applied to the inversion of dense matrices via Gauss-Jordan elimination to the more challenging QR factorization as well as the initial orthogonal reduction to band form found in the singular value decomposition. Our new task-parallel algorithms leverage the tasking mechanism currently available in OpenMP to exploit "nested" task parallelism, with a first outer level that operates on matrix panels and a second inner level that processes the matrix either by mu$$ \mu $$-panels or by tiles, in order to expose a large number of independent tasks. We present a detailed performance analysis, including execution traces, which shows that the two-level refinement into fine grain tasks allows for an improved load balancing and delivers high performance on current general-purpose many-core processors (CPUs) from Intel and AMD.
更多
查看译文
关键词
CPUs, high performance, matrix factorizations, matrix inversion, OpenMP, task parallelism
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要