ECBC: Efficient Convolution via Blocked Columnizing

IEEE Transactions on Neural Networks and Learning Systems(2023)

引用 7|浏览39
暂无评分
摘要
Direct convolution methods are now drawing increasing attention as they eliminate the additional storage demand required by indirect convolution algorithms (i.e., the transformed matrix generated by the im2col convolution algorithm). Nevertheless, the direct methods require special input–output tensor formatting, leading to extra time and memory consumption to get the desired data layout. In this article, we show that indirect convolution, if implemented properly, is able to achieve high computation performance with the help of highly optimized subroutines in matrix multiplication while avoid incurring substantial memory overhead. The proposed algorithm is called efficient convolution via blocked columnizing (ECBC). Inspired by the im2col convolution algorithm and the block algorithm of general matrix-to-matrix multiplication, we propose to conduct the convolution computation blockwisely. As a result, the tensor-to-matrix transformation process (e.g., the im2col operation) can also be done in a blockwise manner so that it only requires a small block of memory as small as the data block. Extensive experiments on various platforms and networks validate the effectiveness of ECBC, as well as the superiority of our proposed method against a set of widely used industrial-level convolution algorithms.
更多
查看译文
关键词
Convolutional neural networks (CNNs),direct convolution,high performance computing for mobile devices,im2col convolution,memory-efficient convolution (MEC)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要