Implementing neural machine translation with bi-directional GRU and attention mechanism on FPGAs using HLS.

ASP-DAC(2019)

引用 36|浏览87
暂无评分
摘要
Neural machine translation (NMT) is a popular topic in Natural Language Processing which uses deep neural networks (DNNs) for translation from source to targeted languages. With the emerging technologies, such as bidirectional Gated Recurrent Units (GRU), attention mechanisms, and beam-search algorithms, NMT can deliver improved translation quality compared to the conventional statistics-based methods, especially for translating long sentences. However, higher translation quality means more complicated models, higher computation/memory demands, and longer translation time, which causes difficulties for practical use. In this paper, we propose a design methodology for implementing the inference of a real-life NMT (with the problem size = 172 GFLOP) on FPGA for improved run time latency and energy efficiency. We use High-Level Synthesis (HLS) to build high-performance parameterized IPs for handling the most basic operations (multiply-accumulations) and construct these IPs to accelerate the matrix-vector multiplication (MVM) kernels, which are frequently used in NMT. Also, we perform a design space exploration by considering both computation resources and memory access bandwidth when utilizing the hardware parallelism in the model and generate the best parameter configurations of the proposed IPs. Accordingly, we propose a novel hybrid parallel structure for accelerating the NMT with affordable resource overhead for the targeted FPGA. Our design is demonstrated on a Xilinx VCU118 with overall performance at 7.16 GFLOPS.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要