Bi-level Optimization Method for Automatic Reward Shaping of Reinforcement Learning.

International Conference on Artificial Neural Networks and Machine Learning (ICANN)(2022)

引用 0|浏览0
暂无评分
摘要
The key to lowering the threshold of the application of reinforcement learning is the simplicity and convenience of reward function design. At present, reinforcement learning with good performance mostly adopts complex rewards of artificial trial and error, or adopts supervised learning to track the artificial trajectory, but these methods increase the workload. Assuming that the basic mathematical elements (operators, operands) can be used to automatically accomplish the combinatorial search process, it is possible to search for a compact, concise and informative reward model. Starting from this idea, this paper explores the reward function of reinforcement learning, which can find the optimal or suboptimal solution that can meet the multi-optimization index through operator search without clear prior knowledge. Based on AutoML-zero, the automatic search method of operator-level reward function based on evolutionary search is realized, and the reward function algorithm which can satisfy the constraint conditions is found to be equal to or better than human design.
更多
查看译文
关键词
Credit assignment,Reward shaping,AutoML-zero,Evolutionary search
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要