Tucker Tensor Decomposition on FPGA

2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)(2019)

引用 13|浏览0
暂无评分
摘要
Tensor computation has emerged as a powerful mathematical tool for solving high-dimensional and/or extreme-scale problems in science and engineering. The last decade has witnessed tremendous advancement of tensor computation and its applications in machine learning and big data. However, its hardware optimization on resource-constrained devices remains an (almost) unexplored field. This paper presents an hardware accelerator for a classical tensor computation framework, Tucker decomposition. We study three modules of this architecture: tensor-times-matrix (TTM), matrix singular value decomposition (SVD), and tensor permutation, and implemented them on Xilinx FPGA for prototyping. In order to further reduce the computing time, a warm-start algorithm for the Jacobi iterations in SVD is proposed. A fixed-point simulator is used to evaluate the performance of our design. Some synthetic data sets and a real MRI data set are used to validate the design and evaluate its performance. We compare our work with state-of-the-art software toolboxes running on both CPU and GPU, and our work shows 2.16 – 30.2× speedup on the cardiac MRI data set.
更多
查看译文
关键词
GPU,CPU,tensor-times-matrix,tensor computation,MRI data set,Xilinx FPGA,tensor permutation,SVD,matrix singular value decomposition,classical tensor computation framework,hardware accelerator,resource-constrained devices,hardware optimization,big data,machine learning,mathematical tool,Tucker tensor decomposition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要