Enhancing upper confidence bounds for trees with temporal difference values

Computational Intelligence and Games(2014)

引用 8|浏览4
暂无评分
摘要
Upper confidence bounds for trees (UCT) is one of the most popular and generally effective Monte Carlo tree search (MCTS) algorithms. However, in practice it is relatively weak when not aided by additional enhancements. Improving its performance without reducing generality is a current research challenge. We introduce a new domain-independent UCT enhancement based on the theory of reinforcement learning. Our approach estimates state values in the UCT tree by employing temporal difference (TD) learning, which is known to outperform plain Monte Carlo sampling in certain domains. We present three adaptations of the TD(λ) algorithm to the UCT's tree policy and backpropagation step. Evaluations on four games (Gomoku, Hex, Connect Four, and Tic Tac Toe) reveal that our approach increases UCT's level of play comparably to the rapid action value estimation (RAVE) enhancement. Furthermore, it proves highly compatible with a modified all moves as first heuristic, where it considerably outperforms RAVE. The findings suggest that integration of TD learning into MCTS deserves further research, which may form a new class of MCTS enhancements.
更多
查看译文
关键词
Monte Carlo methods,backpropagation,game theory,tree searching,trees (mathematics),MCTS algorithms,Monte Carlo sampling,Monte Carlo tree search algorithms,RAVE enhancement,UCT tree policy,backpropagation,connect four game,domain-independent UCT enhancement,gomoku game,hex game,rapid action value estimation enhancement,reinforcement learning,state value estimation,temporal difference learning,temporal difference values,tic tac toe game,upper confidence bounds for trees
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要