Incentivizing Exploration by Heterogeneous Users.

Bangrui Chen,Peter I. Frazier,David Kempe

COLT（2018）

引用 26|浏览52

暂无评分

摘要

We consider the problem of incentivizing exploration with heterogeneous agents. In this problem, bandit arms provide vector-valued outcomes equal to an unknown arm-specific attribute vector, perturbed by independent normally distributed noise. Agents arrive sequentially and choose arms to pull based on their own private and heterogeneous linear utility functions over attributes and the estimates of the armsu0027 attribute vectors derived from observations of other agentsu0027 past pulls. Agents are myopic and selfish and thus would choose the arm with maximum estimated utility. A principal, knowing only the distribution from which agentsu0027 preferences are drawn, but not the specific draws, can offer incentive payments for pulling specific arms in order to encourage agents to explore underplayed arms. The principal seeks to minimize the total expected cumulative regret incurred by agents relative to their best arms, while also making a small expected cumulative payment.We propose an algorithm that incentivizes arms played infrequently in the past whose probability of being played in the next round would be small without incentives. Under the assumption that each arm is preferred by at least a fraction (pu003e0) of agents, we show that this algorithm achieves expected cumulative regret of (O(Nexp(2/p) + N log^3(T))), using expected cumulative payments of (O(N^2exp(2/p))). If (p) is known or the distribution over agent preferences is discrete, the exponential term (exp(2/p)) can be replaced with suitable polynomials in (N) and (1/p). For discrete preferences, the regret dependence on (T) can be eliminated entirely, giving constant (depending only polynomially on (N) and (1/p) expected regret and payments. This constant regret stands in contrast to the (Theta(log(T))) dependence of regret in standard multi-armed bandit problems. It arises because even unobserved heterogeneity in agent preferences allows exploitation of arms to also explore arms fully; succinctly, heterogeneity provides free exploration.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要