Inclusive-PIM: Hardware-Software Co-design for Broad Acceleration on Commercial PIM Architectures
CoRR(2023)
摘要
Continual demand for memory bandwidth has made it worthwhile for memory
vendors to reassess processing in memory (PIM), which enables higher bandwidth
by placing compute units in/near-memory. As such, memory vendors have recently
proposed commercially viable PIM designs. However, these proposals are largely
driven by the needs of (a narrow set of) machine learning (ML) primitives.
While such proposals are reasonable given the the growing importance of ML, as
memory is a pervasive component,
more inclusive PIM design that can accelerate primitives across domains.
In this work, we ascertain the capabilities of commercial PIM proposals to
accelerate various primitives across domains. We first begin with outlining a
set of characteristics, termed PIM-amenability-test, which aid in assessing if
a given primitive is likely to be accelerated by PIM. Next, we apply this test
to primitives under study to ascertain efficient data-placement and
orchestration to map the primitives to underlying PIM architecture. We observe
here that, even though primitives under study are largely PIM-amenable,
existing commercial PIM proposals do not realize their performance potential
for these primitives. To address this, we identify bottlenecks that arise in
PIM execution and propose hardware and software optimizations which stand to
broaden the acceleration reach of commercial PIM designs (improving average PIM
speedups from 1.12x to 2.49x relative to a GPU baseline). Overall, while we
believe emerging commercial PIM proposals add a necessary and complementary
design point in the application acceleration space, hardware-software co-design
is necessary to deliver their benefits broadly.
更多查看译文
关键词
commercial inclusive-pim architectures,broad acceleration,hardware-software,co-design
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要