Where, When & Which Concepts Does AlphaZero Learn? Lessons from the Game of Hex

Jessica Zosa Forde,Charles Lovering,George Konidaris,Ellie Pavlick,Michael L. Littman

semanticscholar（2022）

引用 1|浏览0

暂无评分

摘要

AlphaZero, an approach to reinforcement learning that couples neural networks and Monte Carlo tree search (MCTS), has produced state-of-the-art strategies for traditional board games like Chess, Go, and Hex. While researchers and game commentators have suggested that AlphaZero uses concepts humans consider important, it is unclear how these concepts are represented in the network. We investigate AlphaZero’s representations in Hex using both model probing and behavioral tests. We find that the MCTS search initially finds important concepts, and then the neural network learns to encode these concepts. Concepts related to short-term end-game planning are best encoded in the final layers of the model, whereas concepts related to long-term planning are encoded in the middle layers of the model.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要