An Optimal Algorithm for the Stochastic Bandits While Knowing the Near-Optimal Mean Reward

Shangdong Yang
Shangdong Yang

IEEE transactions on neural networks and learning systems, pp. 1-7, 2020.

Cited by: 0|Bibtex|Views37|Links
WOS

Abstract:

This brief studies a variation of the stochastic multiarmed bandit (MAB) problems, where the agent knows the a priori knowledge named the near-optimal mean reward (NoMR). In common MAB problems, an agent tries to find the optimal arm without knowing the optimal mean reward. However, in more practical applications, the agent can usually ge...More

Code:

Data:

Your rating :
0

 

Tags
Comments