CU.POKer: Placing DNNs on WSE With Optimal Kernel Sizing and Efficient Protocol Optimization

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems(2022)

引用 0|浏览27
暂无评分
摘要
The tremendous growth in deep learning (DL) applications has created an exponential demand for computing power, which leads to the rise of AI-specific hardware. Targeted toward accelerating computation-intensive DL applications, AI hardware, including but not limited to GPGPU, TPU, ASICs, etc., have been adopted ubiquitously. As a result, domain-specific CAD tools play more and more important roles and have been deeply involved in both the design and compilation stages of modern AI hardware. Recently, ISPD 2020 contest introduced a special challenge targeting at the physical mapping of neural network workloads onto the largest commercial DL accelerator, CS-1 wafer-scale engine (WSE). In this article, we proposed CU.POKer, a high-performance engine fully customized for WSE’s deep neural network workload placement challenge. A provably optimal placeable kernel candidate searching scheme and a data-flow-aware placement tool are developed accordingly to ensure the state-of-the-art (SOTA) quality on the real industrial benchmarks. Experimental results on ISPD 2020 contest evaluation suites demonstrated the superiority of our proposed framework over not only the SOTA placer but also the conventional heuristics used in general floorplanning.
更多
查看译文
关键词
AI chip compilation,deep learning (DL) accelerator,neural network workload placement,wafer-scale engine (WSE)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要