FPGA-QHAR: Throughput-Optimized for Quantized Human Action Recognition on The Edge.

Azzam Alhussain,Mingjie Lin

CoRR(2023)

引用 0|浏览1
暂无评分
摘要
Accelerating Human Action Recognition (HAR) efficiently for real-time surveillance and robotic systems on edge chips remains a challenging research field, given its high computational and memory requirements. This paper proposed an integrated end-to-end HAR scalable HW/SW accelerator co-design based on an enhanced 8-bit quantized Two-Stream SimpleNet-PyTorch CNN architecture. Our network accelerator was trained on UCF101 and UCF24 datasets and implemented on edge SoC-FPGA. Our development uses partially streaming dataflow architecture to achieve higher throughput versus network design and resource utilization trade-off. We also fused all convolutional, batch-norm, and ReLU operations into a single homogeneous layer and utilized the Lucas-Kanade motion flow method to enable a high parallelism accelerator design and optimized on-chip engine computing.Furthermore, our proposed methodology achieved nearly 81% prediction accuracy with an approximately 24 FPS real-time inference throughput at 187MHz on ZCU104, which is 1.7x - 1.9x higher than the prior research. Lastly, the designed framework was benchmarked against several hardware chips for higher throughput and performance measurements and is now available as an open-source project on GitHub for training and implementation on edge platforms.
更多
查看译文
关键词
quantized human action recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要