Neural-Linear Architectures for Sequential Decision Making

2019 Fifth Indian Control Conference (ICC)(2019)

引用 0|浏览20
暂无评分
摘要
Making optimal decisions, while learning in dynamic environments like Markov Decision Processes and Multi-armed Bandits, often requires the computation of accurate uncertainty estimates. However, learning directly from raw, high dimensional inputs like vision and natural language, is typically done using deep neural networks, for which such accurate estimates are not available. Neural-linear algorithms address this challenge by utilizing linear algorithms (for which accurate uncertainty estimates exist) on top of a non-linear (deep) representation that is learned directly from the raw input. Such architectures have been recently explored, showing superior performance compared to both deep and linear state-of-the-art algorithms. A practical challenge in this approach is that the representation is assumed to be fixed over time when used by the linear algorithm, while the deep learning based representation changes as the optimization proceeds. In this talk, I will review recent neural-linear algorithms and discuss an algorithmic approach to deal with representations that change over time. In particular, I will present a linear fitted Q iteration algorithm that refines the weights of the last layer of a deep q network and improves its performance in the arcade learning environment, a neural-linear Thompson Sampling algorithm for contextual bandits and deep reinforcement learning, and an action elimination algorithm for text-based games that eliminates actions based on a linear upper confidence bound.
更多
查看译文
关键词
neural-linear architectures,optimal decisions,natural language,deep neural networks,nonlinear representation,deep learning,Q iteration algorithm,arcade learning environment,deep reinforcement learning,action elimination algorithm,sequential decision making,Markov decision processes,multi-armed bandits,neural-linear Thompson sampling algorithm,text-based games
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要