Automatic Parallelism Management.
Proceedings of the ACM on Programming Languages(2024)
摘要
On any modern computer architecture today, parallelism comes with a modest cost,
born from the creation and management of threads or tasks. Today, programmers
battle this cost by manually optimizing/tuning their codes to minimize the cost
of parallelism without harming its benefit, performance. This is a difficult
battle: programmers must reason about architectural constant factors hidden
behind layers of software abstractions, including thread schedulers and memory
managers, and their impact on performance, also at scale. In languages that
support higher-order functions, the battle hardens: higher order functions can
make it difficult, if not impossible, to reason about the cost and benefits of
parallelism. Motivated by these challenges and the numerous advantages of high-level
languages, we believe that it has become essential to manage parallelism
automatically so as to minimize its cost and maximize its benefit. This is a
challenging problem, even when considered on a case-by-case,
application-specific basis. But if a solution were possible, then it could
combine the many correctness benefits of high-level languages with performance
by managing parallelism without the programmer effort needed to ensure
performance. This paper proposes techniques for such automatic management of
parallelism by combining static (compilation) and run-time techniques.
Specifically, we consider the Parallel ML language with task parallelism, and
describe a compiler pipeline that embeds "potential parallelism" directly into
the call-stack and avoids the cost of task creation by default. We then pair
this compilation pipeline with a run-time system that dynamically converts
potential parallelism into actual parallel tasks. Together, the compiler and
run-time system guarantee that the cost of parallelism remains low without
losing its benefit. We prove that our techniques have no asymptotic impact on
the work and span of parallel programs and thus preserve their asymptotic
properties. We implement the proposed techniques by extending the MPL compiler
for Parallel ML and show that it can eliminate the burden of manual optimization
while delivering good practical performance.
更多查看译文
关键词
compilers,granularity control,parallel programming languages
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要