Automatically Reinforcing a Game AI.

arXiv: Artificial Intelligence(2016)

引用 23|浏览35
暂无评分
摘要
A recent research trend in Artificial Intelligence (AI) is the combination of several programs into single, stronger, program; this is termed portfolio methods. We here investigate the application of such methods to Game Playing Programs (GPPs). In addition, we consider the case in which only GPP is available - by decomposing this single GPP into several ones through the use of parameters or even simply random seeds. These portfolio methods are trained in a learning phase. We propose two different offline approaches. The simplest one, BestArm, is a straightforward optimization of seeds or parame- ters; it performs quite well against the original GPP, but performs poorly against an opponent which repeats games and learns. The second one, namely Nash-portfolio, performs similarly in a one game test, and is much more robust against an opponent who learns. We also propose an online learning portfolio, which tests several of the GPP repeatedly and progressively switches to the best - using a bandit algorithm.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要