A Many-core Architecture for an Ensemble Ternary Neural Network Toward High-Throughput Inference.

2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)(2023)

引用 0|浏览2
暂无评分
摘要
Machine learning is expanding in various applications, such as image processing in data centers. With the spread of deep learning, neural-network-based models have frequently been adopted in recent years. Due to the slow processing speed of machine learning evaluation on a CPU, high-speed, dedicated hardware accelerators are often used. In particular, the demand for hardware accelerators in data centers is increasing, with a need for low power consumption and high-speed processing in a limited space. Here, we propose an implementation method for a ternary neural network, utilizing the rewritable look-up table (LUT) of a field-programmable gate array (FPGA). Ternary neural networks (TNNs), quantized to 2 bits, can be realized with LUT-based combinational circuits, allowing inference processing in a single cycle. Thus, a very high-speed inference system can be realized. Moreover, we have reduced the hardware quantity by 70% by introducing sparsity, i.e., approximating the parameters to zero. However, there was a downside of reduced recognition accuracy due to the low-bit representation. In this paper, we used an ensemble to achieve recognition accuracy equivalent to that of the 32-bit float model to prevent the decrease in recognition accuracy. We also designed a voting circuit for the ensemble TNN that does not decrease throughput. By implementing it on the AMD Alveo U50 FPGA card, we achieved a high processing speed of 100 Mega Frames Per Second (MFPS). Our FPGA-based system was 1,286 times faster than the CPU and 1,364 times faster than the GPU. Therefore, we achieve a high-speed inference system without compromising recognition accuracy.
更多
查看译文
关键词
Many core,Neural Network,Ensemble nueral network,FPGA
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要