Efficient Implementation of Convolution and Winograd on ASMP Embedded Multicore Vector Processor
2020 IEEE Workshop on Signal Processing Systems (SiPS)(2020)
摘要
Efficient inference of Convolutional Neural Network (CNN) is a challenging task and design choices heavily rely on the target context and size of the CNN model. Many devices are available, each one targeting a specific class of application. The most famous ones target the server-side of cloud applications and some focus on embedded applications. In this paper we show how to exploit the low-level hardware features of an embedded multicore called STxP70 ASMP, each core being equipped with a vector coprocessor. This work shows how to adapt the algorithm to the platform and vice-versa, and provides an original algo rithmic transform to optimize internal resources. Experiments are made to study the effect of numerous design parameters and CNN configurations. The results show the benefits of the proposed strategy and outline the low-level hardware features required to further optimize CNN inference.
更多查看译文
关键词
Convolution,Memory management,Optimization,Registers,Standards,Pipelines
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要