Optimized high-level synthesis of SMT multi-threaded hardware accelerators

2015 International Conference on Field Programmable Technology (FPT)(2015)

引用 4|浏览23
暂无评分
摘要
Recent high-level synthesis tools offer the capability to generate multi-threaded micro-architectures to hide memory access latencies. In many HLS flows, this is often achieved by just creating multiple processing element-instances (one for each thread). However, more advanced compilers can synthesize hardware in a spatial form of the barrel processor- or simultaneous multi-threading (SMT) approaches, where only state storage is replicated per thread, while the actual hardware operators in a single datapath are re-used between threads. The spatial nature of the micro-architecture applies not only to the hardware operators, but also to the thread scheduling facility, which itself is spatially distributed across the entire datapath in separate hardware stages. Since each of these thread scheduling stages, which also allow a re-ordering of threads, adds hardware overhead, it is worthwhile to examine how their number can be reduced while maintaining the performance of the entire datapath. We report on a number of thinning options and examine their impact on system performance. For kernels from the MachSuite HLS benchmark collection, we have achieved area savings of up to 50% LUTs and 50% registers, while maintaining full performance for the compiled hardware accelerators.
更多
查看译文
关键词
optimized high-level synthesis,look-up table,thread scheduling,hardware microarchitecture,simultaneous multithreading processor,barrel processor,multithreaded microarchitecture,SMT multithreaded hardware accelerators
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要