Stacked calibration of off-policy policy evaluation for video game matchmaking

Computational Intelligence in Games(2013)

引用 1|浏览188
暂无评分
摘要
We consider an industrial strength application of recommendation systems for video-game matchmaking in which off-policy policy evaluation is important but where standard approaches can hardly be applied. The objective of the policy is to sequentially form teams of players from those waiting to be matched, in such a way as to produce well-balanced matches. Unfortunately, the available training data comes from a policy that is not known perfectly and that is not stochastic, making it impossible to use methods based on importance weights. Furthermore, we observe that when the estimated reward function and the policy are obtained by training from the same off-policy dataset, the policy evaluation using the estimated reward function is biased. We present a simple calibration procedure that is similar to stacked regression and that removes most of the bias, in the experiments we performed. Data collected during beta tests of Ghost Recon Online, a first person shooter from Ubisoft, were used for the experiments.
更多
查看译文
关键词
calibration,computer games,pattern matching,recommender systems,regression analysis,Ghost Recon Online,Ubisoft,beta tests,industrial strength application,off-policy policy evaluation,recommendation systems,reward function,stacked calibration procedure,stacked regression,training data,video game matchmaking
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要