Efficient parallel branch-and-bound search on FPGAs using work stealing and instance-specific designs

semanticscholar（2019）

引用 0|浏览0

暂无评分

摘要

In recent years, increasing technization and market analysis purposes have resulted in more and more data being generated. The growing need of data analysis and processing has become omnipresent for many combinatorial optimization or planning problems. To take advantage of the promises of the digital age, efficient search algorithms and their efficient implementations in terms of performance and energy efficiency are important. Only the combination and fine tuning of efficient algorithms and their efficient implementation on suitable platforms can lead to a high performance and low energy consumption. One of the most common methods for processing such very large search spaces is using branch-and-bound (B&B) search algorithms. B&B search algorithms are highly relevant because they are used to solve many real-world operational problems (e.g. production and personnel planning, scheduling, complex decision processes, etc.). The search space in branch-and-bound searches is organized in a tree data structure and the algorithm tries to eliminate infeasible solutions as early as possible by pruning unpromising subtrees through a bounding function. Since these excluded subtrees no longer have to be considered, the computing effort is reduced considerably in some cases. In this thesis, we study the insufficiently understood efficient realization of branch-and-bound algorithms for field programmable gate arrays (FPGAs). FPGAs are integrated circuits consisting of programmable logic blocks and programmable interconnects that can be specialized for specific applications after manufacturing the chip. Branch-and-bound problems are inherently difficult and not the typical class of problems that have been tackled using FPGAs, because they are controldriven and not data-driven. On the other hand, FPGAs have proven to be highly efficient in terms of chip area, power consumption and performance for a wide range of other suitable application domains. In this thesis, we bridge this gap and show that custom hardware designs can significantly accelerate the execution of these algorithms. First, we identify general elements of B&B algorithms and develop and demonstrate their efficient implementation as a finite state machine on FPGAs. Our architecture shows trade-offs between highly optimized combinational datapaths for the performance-critical parts of the search tree and more resource-efficient pipelined ones for the less frequent and more complex parts. Then we extend our design with two optimization techniques to further improve the efficiency. For the first optimization we introduce the concept of hardware workers that autonomously cooperate using work stealing to allow parallel execution of branch-and-bound algorithms and full utilization of the target FPGA. The hardware workers dynamically share and balance their work and show near linear speedups. For the second optimization we explore the advantages of instance-specific designs for B&B algorithms that target a specific problem instance to improve performance and combine them with the design using work stealing. The instance-specific design

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要