A Multiply-and-Accumulate Array for Machine Learning Applications Based on a 3D Nanofabric Flow

IEEE TRANSACTIONS ON NANOTECHNOLOGY(2021)

引用 3|浏览21
暂无评分
摘要
To keep pushing Moore's law cadence and improve integrated circuits area, delay, and power, novel fabrication schemes such as parallel and monolithic 3D integration have been recently proposed. While parallel 3D is limited by the large TSV pitch, monolithic 3D suffers from the high cost of the additional masks and processing steps, limiting the number of stacked transistor layers. In our previous work, we introduced a novel 3D integration scheme called 3D Nanofabric. Inspired by the 3D NAND flash process, the flow consists of N identical vertical tiers where multiple vertical layers can be patterned simultaneously, significantly reducing the manufacturing cost. In this paper, we propose to build low-footprint Multiply-And-Accumulate (MAC) units using our 3D Nanofabric flow. Since a MAC unit can be laid out as a regular array, we demonstrate how to arrange in a 3D fashion across several vertical tiers of the 3D Nanofabric. Through circuit-level evaluations, we show that for a 64-input bit MAC unit consisting of 64 stacked vertical tiers, the area and area-delay-product are reduced by 21.0x and 16.7x, respectively, compared to a traditional 2D implementation using a 28 nm FDSOI technology, with only a 43% energy overhead. More importantly, the total fabrication cost is reduced, producing a cost scaling roadmap. Additionally, we show how to build a systolic 3D MAC array aimed at convolutional neural networks. Through architectural evaluations, we demonstrate that when running VGG-16, our 3D MAC array can improve the TOPs/mm(2) by 2.8x compared to a TPU-like 2D systolic array.
更多
查看译文
关键词
3D logic integration, emerging technologies, hardware accelerators, nanotechnologies
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要