Declarative abstractions for tensor program partitioning.

PPDP(2020)

引用 0|浏览7
暂无评分
摘要
The size of state-of-the-art machine learning models is continuously growing; for instance GPT-3, a recent language model trained by OpenAI, contains 175B parameters. Due to memory limitations and scalability constraints hardware acceleration for such models relies on configuring them as systems of accelerator devices (such as GPUs, or TPUs, or even simple compute cores with fast local memory) with custom interconnect networks. This setting poses a challenge for software: there is an increasing need for flexible ways to distribute these multi-dimensional array programs (tensor programs) across systems of accelerator devices. We outline in this talk how ideas from deforestation and stream fusion are relevant for the domain of tensor programming and partitioning. Specifically, we see how the concept of array “builders”, aiming primarily at code generation, can be extended to array “slicers”. Array slicers, together with algebraic representations of range objects and declarative rewrite rules, can express a variety of different and accelerator-agnostic distribution strategies. We will see how a tensor IR can be extended with such abstractions, how we can drive partitioning through user annotations or interactive tactics, and – as a demonstration – how it may be lowered to a low-level executable dataflow graph of SPMD kernels. We will finally discuss some remaining hard problems and further transformations that are essential for scaling up models on systems of accelerators, and where ideas from declarative programming could prove useful.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要