Qtaccel: A Generic Fpga Based Design For Q-Table Based Reinforcement Learning Accelerators

Yuan Meng,Sanmukh Kuppannagari,Rachit Rajat,Ajitesh Srivastava,Rajgopal Kannan,Viktor Prasanna

FPGA（2020）

引用 11|浏览29

暂无评分

摘要

Q-Table based Reinforcement Learning (QRL) is a class of widely used algorithms in AI that work by successively improving the estimates of Q-values - quality of state-action pairs, stored in a table. They significantly outperform Neural Network based techniques when the state space is tractable. Fast learning for AI applications in several domains (such as robotics), with tractable 'mid-sized' Q-tables, still necessitates performing a large number of rapid updates. State-of-the-art FPGA implementations of QRL do not scale well with the increasing Q-Table state space. Thus, they are not efficient for such applications. In this work, we develop a novel FPGA based design of QRL and SARSA (State Action Reward State Action), scalable to large state spaces and thereby facilitating a large class of AI applications. Our architecture provides higher throughput while using significantly fewer on-chip resources. It is capable of supporting a variety of action selection policies that covers Q-Learning and variations of bandit algorithms and can be easily extended for multi-agent Q learning. Our pipelined implementation fully handles the dependencies between consecutive updates allowing it to process one sample every clock cycle. We evaluate our architecture for Q-Learning and SARSA algorithms and show that our designs achieve a high throughput of up to 180 million samples per second.

查看译文

关键词

FPGA based design,Q-table based reinforcement learning accelerators,QRL,state-action pairs,AI applications,state action reward state action,action selection policies,bandit algorithms,multiagent Q learning,Q-table state space,QTAccel,SARSA,on-chip resources,clock cycle

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要