Pluggable Deep Thompson Sampling with Applications to Recommendation.

SDM(2023)

引用 0|浏览3
暂无评分
摘要
Thompson Sampling (TS) is an effective way to deal with the exploration-exploitation dilemma for the multi-armed (contextual) bandit problem. Due to the sophisticated relationship between contexts and rewards in real- world applications, neural networks are often preferable to model this relationship owing to their superior representation capacity. In this paper, we study the problem of combining neural networks with TS in a plug-and-play manner. The basic idea is to maintain a posterior distribution over the reward mean relying on the prediction and the deep representation of the neural network for any given context. Specifically, our proposed algorithm, PlugTS (Pluggable deep Thompson Sampling), introduces no change into the network training process, but only requires one additional sampling stage during serving - sampling from a univariate Gaussian distribution (by maintaining a positive definite matrix). Theoretically, we prove that PlugTS achieves an regret bound, which matches the state-of-the-art neural network-based TS, while PlugTS enjoys much lower computational overhead for each iteration. Experimental results on public datasets among traditional classification and recommendation tasks validate the effectiveness and efficiency of PlugTS. Furthermore, it is inspiring for real-world applications that a simplified version of PlugTS has been deployed in an industrial advertising recommender system of one of the world's largest e-commerce platforms, JD.com, achieving significant improvement in both RPM (Revenue Per Mille) and CTR (Click-Through Rate) in online A/B testing. The appendix and code are available at https://github.com/adsturing/PlugTS.
更多
查看译文
关键词
sampling,deep
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要