Incentivizing High Quality User Contributions: New Arm Generation in Bandit Learning.

THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE(2018)

引用 22|浏览21
暂无评分
摘要
We study the problem of incentivizing high quality contributions in user generated content platforms, in which users arrive sequentially with unknown quality. We are interested in designing a content displaying strategy which decides which content should be chosen to show to users, with the goal of maximizing user experience (i.e., the likelihood of users liking the content). This goal naturally leads to a joint problem of incentivizing high quality contributions and learning the unknown content quality. To address the incentive issue, we consider a model in which users are strategic in deciding whether to contribute and are motivated by exposure, i.e., they aim to maximize the number of times their contributions are viewed. For the learning perspective, we model the content quality as the probability of obtaining positive feedback (e.g., like or upvote) from a random user. Naturally, the platform needs to resolve the classical trade-off between exploration (collecting feedback for all content) and exploitation (displaying the best content). We formulate this problem as a multi-arm bandit problem, where the number of arms (i.e., contributions) is increasing over time and depends on the strategic choices of arriving users. We first show that applying standard bandit algorithms incentivizes a flood of low cost contributions, which in turn leads to linear regret. We then propose Rand UCB which adds an additional layer of randomization on top of the UCB algorithm to address the issue of flooding contributions. We show that Rand UCB helps eliminate the incentives for low quality contributions, provides incentives for high quality contributions (due to bounded number of explorations for the low quality ones), and achieves sub-linear regrets with respect to displaying the current best arms.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要