Experience
Sign in to view more
Education
Sign in to view more
Bio
Research interests:
Bandit theory
Optimistic algorithms (KL-UCB, UCB-V), Thompson sampling, many-armed bandits
Foundations of Monte-Carlo Tree Search
Optimistic optimization (HOO, SOO, StoSOO), optimistic planning (OP-MDP, OLOP)
Bandits in graphs and other structured spaces
Reinforcement Learning (RL)
Analysis of Reinforcement Learning and Dynamic Programming (DP) with function approximation
Finite-sample analysis of RL and DP (Lasso-TD, LSTD, AVI, API, BRM, compressed-LSTD)
Policy gradient and sensitivity analysis
Sampling methods for MDPs, Bayesian RL, POMDPs
Optimal control in continuous time
Numerical solutions to HJB equations
Stability analysis via viscosity solutions
Variable resolution discretizations
Statistical learning and randomization
Random projections for least squares regression
Adaptive sampling for Monte-Carlo integration
Active learning and sparse bandits