An Optimal Algorithm for the Stochastic Bandits While Knowing the Near-Optimal Mean Reward
IEEE transactions on neural networks and learning systems, pp. 1-7, 2020.
This brief studies a variation of the stochastic multiarmed bandit (MAB) problems, where the agent knows the a priori knowledge named the near-optimal mean reward (NoMR). In common MAB problems, an agent tries to find the optimal arm without knowing the optimal mean reward. However, in more practical applications, the agent can usually ge...More
Full Text (Upload PDF)
PPT (Upload PPT)