Beyond UCT: MAB Exploration Improvements for Monte Carlo Tree Search.

Robert C. Gray,Jichen Zhu,Santiago Ontañón

2023 IEEE Conference on Games (CoG)（2023）

引用 0|浏览0

暂无评分

摘要

Monte Carlo Tree Search (MCTS) employs Multi-Armed Bandit (MAB) techniques to direct the policy for child node selection during tree construction. Typical MCTS implementations have relied on the Upper Confidence Bounds for Trees (UCT) strategy, which leverages a specific variant of the general Upper Confidence Bounds (UCB) approach. The success of such strategies relies heavily on the proper tuning of the UCB C parameter to guide exploration effectively. This paper examines (1) the advantages of per-arm tuning of C, (2) the potential for a parameter-less UCB variant called UCBT to provide opportunities for automatic derivation of effective C values without prior tuning in a strategy called Poly-UCB1, and (3) the application of both of these concepts toward operational tuning of C during MCTS node expansion and tree construction in a strategy called UCB-Multi. We evaluate our approach in three turn-based, adversarial board games.

查看译文

关键词

Multi-Armed Bandits,UCB1,UCT,Monte Carlo Tree Search

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要