Variable batch size across layers for efficient prediction on CNNs

2020 IEEE 13th International Conference on Cloud Computing (CLOUD)(2020)

引用 2|浏览35
暂无评分
摘要
CNNs are used extensively for computer vision tasks like activity recognition, image classification, segmentation etc. The large compute memory required in these applications restricts the use of high batch size during inference, thereby increasing the overall prediction time. Prior work addresses this issue through various model compression mechanisms like weight/filter pruning, quantizing the parameters/intermediate outputs, etc. We propose a complementary technique where we improve inference time by using variable batch sizes (VBS) across the layers of a CNN. This optimises the memory-time trade-off for each layer and leads to better network throughput. Our approach does not make any modifications to the existing network (unlike pruning or quantization techniques) and thus there is no impact on the model accuracy. We develop a dynamic program (DP) based algorithm that takes inference time and memory required by different layers of the network as input, and computes the optimal batch sizes for each layer depending on the available resources (RAM, storage space etc.). We demonstrate our findings in two different settings: video inference on K80 GPUs and image inference on Edge devices. On video networks like C3D, our VBS algorithm gives up to 61% higher throughput compared to a fixed batch size baseline. On image networks like GoogleNet, ResNet50 etc., we achieve up to 60% higher throughput compared to a fixed batch size baseline.
更多
查看译文
关键词
convolutional neural network,inference,batch
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要